Stylesheet style.css not found, please contact the developer of "arctic" template.

Still under construction

Using the SLURM batch job facility


Common Commands

  1. slurm (introduction)
  2. interactive jobs: salloc (only interactive jobs) && srun (interactive jobs and tasks)
  3. batch jobs: sbatch (batch jobs to queue)
  4. scancel (signal/stop running jobs)
  5. sstat (infos wrt. running jobs)
  6. squeue (show queues)
  7. sinfo (show node infos)


Manpages

  • man 1 sbatch
  • man 1 srun
  • man 1 salloc
  • man 1 scancel
  • man 1 squeue
  • man 1 sinfo


Common terms

  • cpu: hardware processor, a core (not logical CPU entity, no hyperthreading)
  • node: computing host
  • queue: scheduling of jobs on a partition/group of nodes
  • partition: a group of nodes with similar computing capabilities
  • task: runs a process within a SLURM job context (does not need to do resource allocation)
  • allocation: reservation of resources for interactive/batch job
  • job: a collection of job steps (needs resource allocation)
  • step: single command as part of a job


Important switches for sbatch/srun/salloc

  1. -a <index_range> <html> </html>job array with index range, e.g. -a 0,1,4-6 or -a 0-8:2 (environment variables available: SLURM_ARRAY_TASK_ID, SLURM_ARRAY_JOB_ID, SLURM_ARRAY_TASK_COUNT etc.)
  2. -d <dependency_list> <html> </html>define dependencies on other jobs (dependency_list: after:job_id[:job_id] or afterok:job_id[:job_id] and so on)
  3. <html>–</html>begin=<time_string> <html> </html>(absolute/relative) start time of job (see man 1 sbatch)
  4. <html>–</html>deadline=<time_string> <html> </html>job only scheduled as long as it finishes before deadline
  5. <html>–</html>mem=<amount[T|G|M|K]> <html> </html>amount of resident memory per node to allocate, e.g. 1G (1 Gibibyte)
  6. -t <time> set job's wall clock time
  7. -e <file> <html> </html>where to redirect error output
  8. -o <file> <html> </html>where to redirect stdout
  9. -i <file> <html> </html>read stdin for job script from file
  10. -J <job_name>
  11. -c <cpus_per_task> <html> </html>number of threads per task
  12. -n <ntasks> <html> </html>the overall amount of task (all nodes)
  13. <html>–</html>tasks-per-node=<ntasks> <html> </html>number of tasks launched on each node
  14. -N <node_list> <html> </html>where <node_list> e. g.: node001; node[001-005,009]; node001,node008
  15. <html>–</html>mail-type=<types> <html> </html>NONE, BEGIN, END, FAIL, REQUEUE, ALL, ARRAY_TASKS etc., for further explanations see man 1 sbatch
  16. -v <html> </html>run in verbose mode


Useful combination of arguments for srun/sbatch/salloc

  1. -N and -n and <html>–</html>task-per-node=
  2. -N and -n and -c


hpc/hpc_tubit/slurm_usage.txt · Zuletzt geändert: 2018/09/01 11:55 von kraus
CC Attribution-Noncommercial-Share Alike 4.0 International
Driven by DokuWiki Recent changes RSS feed Valid CSS Valid XHTML 1.0