Users with codes explicitly written to run on GPUs can utilize the GPU resources. Users may request a single GPU node on the ‘interactive’, ‘short’, ‘medium’, or ‘long’ partition, and then make use of the requested GPU resources. It is worth noting that multiple GPUs can be requested for a single job, with each GPU node having 4 GPUs available. For instance, let’s request a GPU node and verify that PyTorch can access the GPU. See gpu_test.sl Slurm file as follows.
The gpu_test.sl Slurm file as follows.
#!/bin/bash
#SBATCH --job-name=sample # Job name
#SBATCH --output=%x_%j.out # Output file
#SBATCH --error=%x_%j.err # Error file
#SBATCH --nodes=1 # Number of nodes
#SBATCH --ntasks=1 --cpus-per-task=1 # 1 CPU on a single node
#SBATCH --partition=short # Any partition with GPU nodes
#SBATCH --gres=gpu:1 # Number of GPUs per node
#SBATCH --mem-per-cpu=1g # Memory request per CPU
#SBATCH --time=00:10:00 # Time limit (hrs:min:sec)
#SBATCH --mail-type=BEGIN,END,FAIL # Mail events (NONE, BEGIN, END, FAIL, ALL)
#SBATCH --mail-user=your-BC-email-address # Email for notifications
cd (path to working directory)module load miniconda
conda activate env_pytorch
python3 torch_test.py
And the corresponding python script:
$ torch_test.py
----------------------------------------------------------
import torch
if torch.cuda.is_available():
device = torch.device("cuda")
else:
device = torch.device("cpu")
print(f"Device = {device}")
Submitting this job should output “Device = cuda”, indicating that the GPU is available to do work in PyTorch.
To install the torch library:
[johnchris@a002 ~]$ interactive
Executing: srun --pty -N1 -n1 -c4 --mem=8g -pinteractive -t0-04:00:00 /bin/bash
Press any key to continue or ctrl+c to abort.
(press any key)
cpu-bind=MASK - c011, task 0 0 [1940718]: mask 0x4444000000 set
[johnchris@c011 ~]$ module load miniconda
[johnchris@c011 ~]$ conda create -n env_pytorch python=3.12
Retrieving notices: ...working... done
Channels:
-r
-bioconda
-conda-forge
-defaults
Platform: linux-64
Collecting package metadata (repodata.json): done
Solving environment: done
[johnchris@c011 ~]$ conda activate env_pytorch
(env_pytorch) [johnchris@c011 ~]$ pip install torchvision
Collecting torchvision
Downloading torchvision-0.22.1-cp312-cp312-manylinux_2_28_x86_64.whl.metadata (6.1 kB)
…
Successfully installed MarkupSafe-3.0.2 filelock-3.18.0 fsspec-2025.5.1 jinja2-3.1.6 mpmath-1.3.0 networkx-3.5 numpy-2.3.1 nvidia-cublas-cu12-12.6.4.1 nvidia-cuda-cupti-cu12-12.6.80 nvidia-cuda-nvrtc-cu12-12.6.77 nvidia-cuda-runtime-cu12-12.6.77 nvidia-cudnn-cu12-9.5.1.17 nvidia-cufft-cu12-11.3.0.4 nvidia-cufile-cu12-1.11.1.6 nvidia-curand-cu12-10.3.7.77 nvidia-cusolver-cu12-11.7.1.2 nvidia-cusparse-cu12-12.5.4.2 nvidia-cusparselt-cu12-0.6.3 nvidia-nccl-cu12-2.26.2 nvidia-nvjitlink-cu12-12.6.85 nvidia-nvtx-cu
(env_pytorch) [johnchris@c011 ~]$