Inhaltsverzeichnis
Still under construction
Using the SLURM batch job facility
Common Commands
- slurm (introduction)
- interactive jobs: salloc (only interactive jobs) && srun (interactive jobs and tasks)
- batch jobs: sbatch (batch jobs to queue)
- scancel (signal/stop running jobs)
- sstat (infos wrt. running jobs)
- squeue (show queues)
- sinfo (show node infos)
Manpages
man 1 sbatch
man 1 srun
man 1 salloc
man 1 scancel
man 1 squeue
man 1 sinfo
Common terms
- cpu: hardware processor, a core (not logical CPU entity, no hyperthreading)
- node: computing host
- queue: scheduling of jobs on a partition/group of nodes
- partition: a group of nodes with similar computing capabilities
- task: runs a process within a SLURM job context (does not need to do resource allocation)
- allocation: reservation of resources for interactive/batch job
- job: a collection of job steps (needs resource allocation)
- step: single command as part of a job
Important switches for sbatch/srun/salloc
-a <index_range>
<html> </html>job array with index range, e.g.-a 0,1,4-6
or-a 0-8:2
(environment variables available: SLURM_ARRAY_TASK_ID, SLURM_ARRAY_JOB_ID, SLURM_ARRAY_TASK_COUNT etc.)-d <dependency_list>
<html> </html>define dependencies on other jobs (dependency_list: after:job_id[:job_id] or afterok:job_id[:job_id] and so on)- <html>–</html>
begin=<time_string>
<html> </html>(absolute/relative) start time of job (seeman 1 sbatch
) - <html>–</html>
deadline=<time_string>
<html> </html>job only scheduled as long as it finishes before deadline - <html>–</html>
mem=<amount[T|G|M|K]>
<html> </html>amount of resident memory per node to allocate, e.g. 1G (1 Gibibyte) -t <time>
set job's wall clock time-e <file>
<html> </html>where to redirect error output-o <file>
<html> </html>where to redirect stdout-i <file>
<html> </html>read stdin for job script from file-J <job_name>
-c <cpus_per_task>
<html> </html>number of threads per task-n <ntasks>
<html> </html>the overall amount of task (all nodes)- <html>–</html>tasks-per-node=<ntasks> <html> </html>number of tasks launched on each node
-N <node_list>
<html> </html>where <node_list> e. g.: node001; node[001-005,009]; node001,node008- <html>–</html>
mail-type=<types>
<html> </html>NONE, BEGIN, END, FAIL, REQUEUE, ALL, ARRAY_TASKS etc., for further explanations seeman 1 sbatch
-v
<html> </html>run in verbose mode
Useful combination of arguments for srun/sbatch/salloc
- -N and -n and <html>–</html>task-per-node=
- -N and -n and -c
Links
- https://www.lrz.de/services/compute/linux-cluster/batch_parallel/example_jobs (external, seen on 08-12-2018)
date of revision: 08-30-2018 © kraus