SBATCH – GPU Job Example – ITS / Research Computing

Users with codes explicitly written to run on GPUs can utilize the GPU resources. Users may request a single GPU node on the ‘interactive’, ‘short’, ‘medium’, or ‘long’ partition, and then make use of the requested GPU resources. It is worth noting that multiple GPUs can be requested for a single job, with each GPU node having 4 GPUs available. For instance, let’s request a GPU node and verify that PyTorch can access the GPU. See gpu_test.sl Slurm file as follows.

The gpu_test.sl Slurm file as follows.

#!/bin/bash #SBATCH --job-name=sample # Job name #SBATCH --output=%x_%j.out # Output file #SBATCH --error=%x_%j.err # Error file #SBATCH --nodes=1 # Number of nodes #SBATCH --ntasks=1 --cpus-per-task=1 # 1 CPU on a single node #SBATCH --partition=short # Any partition with GPU nodes #SBATCH --gres=gpu:1 # Number of GPUs per node #SBATCH --mem-per-cpu=1g # Memory request per CPU #SBATCH --time=00:10:00 # Time limit (hrs:min:sec) #SBATCH --mail-type=BEGIN,END,FAIL # Mail events (NONE, BEGIN, END, FAIL, ALL) #SBATCH --mail-user=your-BC-email-address # Email for notifications cd (path to working directory)
module load miniconda conda activate env_pytorch python3 torch_test.py

And the corresponding python script:

$ torch_test.py ---------------------------------------------------------- import torch if torch.cuda.is_available(): device = torch.device("cuda") else: device = torch.device("cpu") print(f"Device = {device}")

Submitting this job should output “Device = cuda”, indicating that the GPU is available to do work in PyTorch.

To install the torch library:

[johnchris@a002 ~]$ interactive Executing: srun --pty -N1 -n1 -c4 --mem=8g -pinteractive -t0-04:00:00 /bin/bash Press any key to continue or ctrl+c to abort. (press any key) cpu-bind=MASK - c011, task 0 0 [1940718]: mask 0x4444000000 set [johnchris@c011 ~]$ module load miniconda [johnchris@c011 ~]$ conda create -n env_pytorch python=3.12 Retrieving notices: ...working... done Channels: -r -bioconda -conda-forge -defaults Platform: linux-64 Collecting package metadata (repodata.json): done Solving environment: done [johnchris@c011 ~]$ conda activate env_pytorch (env_pytorch) [johnchris@c011 ~]$ pip install torchvision Collecting torchvision Downloading torchvision-0.22.1-cp312-cp312-manylinux_2_28_x86_64.whl.metadata (6.1 kB) … Successfully installed MarkupSafe-3.0.2 filelock-3.18.0 fsspec-2025.5.1 jinja2-3.1.6 mpmath-1.3.0 networkx-3.5 numpy-2.3.1 nvidia-cublas-cu12-12.6.4.1 nvidia-cuda-cupti-cu12-12.6.80 nvidia-cuda-nvrtc-cu12-12.6.77 nvidia-cuda-runtime-cu12-12.6.77 nvidia-cudnn-cu12-9.5.1.17 nvidia-cufft-cu12-11.3.0.4 nvidia-cufile-cu12-1.11.1.6 nvidia-curand-cu12-10.3.7.77 nvidia-cusolver-cu12-11.7.1.2 nvidia-cusparse-cu12-12.5.4.2 nvidia-cusparselt-cu12-0.6.3 nvidia-nccl-cu12-2.26.2 nvidia-nvjitlink-cu12-12.6.85 nvidia-nvtx-cu (env_pytorch) [johnchris@c011 ~]$