Slurm Job Management

There are two types of job submission. One is batch, the other is interactive. Interactive jobs allow users to type in commands while the job is running. Batch jobs are self-contained sets of commands in a script submitted to the cluster for execution on a compute node. The following will cover some basic commands to start running jobs.

All jobs must be submitted to the queuing system. For Andromeda, we are using SLURM to schedule jobs to the compute nodes. The most common SLURM commands are as follows:

Submit jobs: sbatch, srun, salloc                

Cancel a job(s) from the queue: scancel        

Show the jobs running and pending to run: squeue         

Display the status of jobs, nodes, partitions, etc.: scontrol, sinfo, shosts

Batch jobs

This is the most common type of job on an HPC system. Batch jobs are frequently used for applications that run for long periods of time and require no manual user input. Users can check the results when the job finishes without requiring any user interaction.

Parameters such as memory, the number of cores, type of node, partitions and wall clock time requested are specified in a command file. You can use the SLURM type of script file, ‘.sl’, to submit a job to the compute node on Andromeda.

Here is an example of a sample.sl script file

This will request 1GB of memory and 1 CPU on a single node for 10 minutes. Slurm should send an email to your BC email address for job state changes defined as ‘END,’ ‘BEGIN,’ and ‘FAIL.’ Note: The maximum requested time is 5 days (120:00:00); the email address must be a BC email address.

To submit the job via the SLURM script file sample.sl, type:

[johnchris@l001 ~]$ sbatch sample.sl

scancel: cancel a job(s) from the queue

“scancel jobid” will cancel a job with a job id.

[johnchris@l001]$ scancel 2193185

Scroll to Top