Job Submission Options – ITS / Research Computing

Please refer to the full list of configurations on the Slurm website using the following link:
https://slurm.schedmd.com/sbatch.html

Many options are available, but we strongly recommend sticking with the basics as they provide a good set of configuration options to meet your computing needs. The format of Slurm Sbatch is specified as follows:

#SBATCH (1 options)=(value) #SBATCH (2 options)=(value) #SBATCH (3 options)=(value) ... #SBATCH (n options)=(value)

This structure allows you to specify multiple options and their corresponding values when configuring your Slurm Sbatch script.

Job Computing Configurations

a). Job name:
#SBATCH --job-name=<job_name>
The purpose of a job title is to remind yourself of what you are doing. The first thing you will want to do is give your job a name. It should be descriptive yet concise.
For example: #SBATCH --job-name=vasp

b). Partition:
#SBATCH --partition=<partition_name>
In a cluster computing environment, partitioning refers to the logical division of computing resources within the cluster. Each partition represents a subset of cluster nodes and resources, which allows efficient management and allocation of resources based on different criteria.
For example: #SBATCH --partition=shared

c). Time Limit:
#SBATCH --time=D-HH:MM:SS
Users are required to inform Slurm about the expected duration of their job. The format for specifying runtime is D-HH:MM, representing Days-Hours:Minutes. The maximum time limit is 5 days on Andromeda. If a user requests a runtime exceeding 5 days, the job will not be scheduled through Slurm.
For example: #SBATCH --time=3-10:30:00 indicates a runtime requirement of 3 days, 10 hours, 30 minutes, and 0 seconds.

In case, if users would like to extend the job over 5 days, please create a ticket for the request to Research Services here:
Research Services Request Form

d). Email:
#SBATCH --mail-user= your-BC-email-address #SBATCH --mail-type=<BEGIN,END,FAIL,ALL>
Users have the option to receive email notifications from Slurm regarding their compute jobs in the queue system and can specify the types of email notifications they wish to receive. Available options include BEGIN (when your job starts), END (when your job finishes), FAIL (if your job fails), and ALL (all of the previous conditions).
For example: #SBATCH --mail-user=johnchris@bc.edu
It’s important to note that only BC email addresses can be used.

Job Output Configurations

a). Output File:
#SBATCH --output=%x_%j.out
Users have the ability to specify the output file name for their running job in Slurm. %x is a variable that fills in the user’s job name, and %j is a variable that fills in the user’s job ID number. Additionally, users can choose to place their output file in a specific folder.
For example: #SBATCH --output=logs/%x_%j.out

b). Error File:
#SBATCH --error=%x_%j.err
Users have the ability to specify the error file name for their running job in Slurm. %x is a variable that fills in the job name, and %j is a variable that fills in the job ID number. Users can also choose to place their error file in a designated folder.
For example: #SBATCH --error=logs/%x_%j.err

Compute Node Configurations

Users have the option to request only a single node. For jobs where the code can leverage MPI (Message Passing Interface), scheduling on multiple nodes might lead to quicker processing.

a). Nodes:
#SBATCH --nodes=<num_nodes>
Users can request nodes for jobs.
For example: #SBATCH --nodes=1

b). Excluding Nodes:
#SBATCH --exclude=<node1,node2,...>
In case, users want to ensure that their job does not run on a specific node or nodes, they can achieve this using the —-exclude option.
For example: #SBATCH --exclude=c001,c002,c003 #SBATCH --exclude=c[001-003]

c). Exclusive Access to a Node:
#SBATCH --exclusive
Users can choose to utilize all resources on a single node exclusively by specifying #SBATCH --exclusiveThe following set of options grants the user exclusive access to the entire node:
#SBATCH --nodes=1 #SBATCH --exclusive

Task Configurations

Based on the Slurm context, tasks are processes that can be executed either on multiple nodes or on a single node. A task represents a running instance of a program. For most practical purposes, users can consider tasks as equivalent to processes.

a). Number of Tasks:
#SBATCH --ntasks=<num_tasks>
Slurm assigns one task to one node by default. User can specify more than one tasks as below:
For example: #SBATCH --ntasks=3

b). Number of Tasks per Node:
#SBATCH --ntasks-per-node=<num_tasks>
If users use multiple nodes, users can specify a number of tasks per node.
For example: #SBATCH --ntasks-per-node=2

CPU Configurations

a). CPUs per Task:
#SBATCH --cpus-per-task=<num_cpus>
Slurm assigns one CPU to per task by default. User can specify more than one CPUs in a job as below:
For example: #SBATCH --cpus-per-task=4

GPU Configurations

Andromeda provides two partitions with GPUs, namely gpuv100 and gpa100. The two partitions provide NVIDIA V100 and NVIDIA A100s, respectively. Users can define a GPU on either by adding to the Slurm file.

a). GPUs per gpu Job:
#SBATCH --gres=gpu:<num_gpus>
Slurm will not allocate any GPUs to users’ jobs by default. Users need to specify how many and what type of GPUs their job would like to use. The following pair of options allows the job to run on a GPU node (NVIDIA V100) with 2 GPUs selected.
For example:
#SBATCH --partition=gpuv100 #SBATCH --gres=gpu:2 srun -p gpuv100 --gres=gpu:v100:1 --pty bash

Memory Configurations

a). Memory per Node:
#SBATCH --mem=<memory>
Users can specify how much memory is needed per node. The default unit is megabyte, where ‘k’ stands for kilobytes (KB), ‘m’ for megabytes (MB), ‘g’ for gigabytes (GB), and ‘t’ for terabytes (TB).
For example: #SBATCH --mem=10g
Note that this represents 10GB of memory per node.

b). Memory per CPU:
#SBATCH --mem-per-cpu=<memory>
Users can also set a RAM memory limit per CPU.
For example: #SBATCH --mem-per-cpu=10g
Note that this limit corresponds to 10GB of RAM memory per CPU.

Please ensure there is no conflict between –mem and –mem-per-cpu parameters in the SLURM file. The following example cannot be scheduled due to an excessive memory request per CPU.

For example:

... #SBATCH --nodes=2 #SBATCH --ntasks-per-node=1 #SBATCH --cpus-per-task=2 #SBATCH --mem=10m #SBATCH --mem-per-cpu=10m

Note: Users are requesting 2 nodes per task, with a total of 4 CPUs and 2 CPUs per task. The memory request is set to 10 MB per node. Since each node only has 10 MB of memory and 2 CPUs, the maximum memory we can request per CPU is 5 MB.

c). Whole Memory for One Node:
#SBATCH --mem=0
Users should define #SBATCH --mem=0 to allocate the entire memory for one node when applying #SBATCH --exclusive

Slurm Filename Variables

Please see the link for the details of filename parameters.
https://slurm.schedmd.com/sbatch.html#SECTION_%3CB%3Efilename-pattern%3C/B%3E

The following table lists some of the common filename variables that might be useful for users.

Variable	Description	Example
%x	Job name	`#SBATCH --output=%x_%j.out`
%j	Jobid of the running job	`#SBATCH --error=%x_%j.err`
%N	short hostname. This will create a separate IO file per node	`#SBATCH --output=%N_%x_%j.out`
%a	Job array ID (index) number	`#SBATCH --error=%x_%a_%j.err`
%%	The character “%”	`#SBATCH --output=%x_20%%.out`