**Still under construction** ====== Using the SLURM batch job facility ====== \\ ===== Common Commands ===== - slurm (introduction) - interactive jobs: **salloc** (only interactive jobs) && **srun** (interactive jobs and tasks) - batch jobs: **sbatch** (batch jobs to queue) - **scancel** (signal/stop running jobs) - **sstat** (infos wrt. running jobs) - **squeue** (show queues) - sinfo (show node infos) \\ ===== Manpages ===== * ''man 1 sbatch'' * ''man 1 srun'' * ''man 1 salloc'' * ''man 1 scancel'' * ''man 1 squeue'' * ''man 1 sinfo'' \\ ===== Common terms ===== * cpu: hardware processor, a core (not logical CPU entity, no hyperthreading) * node: computing host * queue: scheduling of jobs on a partition/group of nodes * partition: a group of nodes with similar computing capabilities * task: runs a process within a SLURM job context (does not need to do resource allocation) * allocation: reservation of resources for interactive/batch job * job: a collection of job steps (needs resource allocation) * step: single command as part of a job \\ ===== Important switches for sbatch/srun/salloc ===== - ''-a '' job array with index range, e.g. ''-a 0,1,4-6'' or ''-a 0-8:2'' (environment variables available: **SLURM_ARRAY_TASK_ID**, **SLURM_ARRAY_JOB_ID**, **SLURM_ARRAY_TASK_COUNT** etc.) - '' -d '' define dependencies on other jobs (dependency_list: **after:job_id[:job_id]** or **afterok:job_id[:job_id]** and so on) - --''begin='' (absolute/relative) start time of job (see ''man 1 sbatch'') - --''deadline='' job only scheduled as long as it finishes before deadline - --''mem='' amount of resident memory **per node** to allocate, e.g. 1G (1 Gibibyte) - ''-t