Users with codes explicitly written to run on GPUs can utilize the GPU resources. A user requests a single GPU on the ‘gpuv100’ partition. and it can then make use of the requested GPU resources. It’s worth noting that multiple GPUs can be requested for a single job, with each GPU node having 4 GPUs available. For instance, let’s request a GPU node and verify that PyTorch can access the GPU.
The gpu_test.sl Slurm file as follows.
#!/bin/bash #SBATCH --job-name=sample # Job name #SBATCH --output=%x_%j.out # Output file #SBATCH --error=%x_%j.err # Error file #SBATCH --nodes=1 # Number of nodes #SBATCH --ntasks=1 --cpus-per-task=1 # 1 CPU on a single node #SBATCH --partition=gpuv100 # GPU node partition #SBATCH --gres=gpu:4 # Number of GPUs per node #SBATCH --mem-per-cpu=1g # Memory request per CPU #SBATCH --time=00:10:00 # Time limit (hrs:min:sec) #SBATCH --mail-type=BEGIN,END,FAIL # Mail events (NONE, BEGIN, END, FAIL, ALL) #SBATCH --mail-user=your-BC-email-address # Email for notifications module load pytorch/1.10.1gpu.cuda11.2 cd (path to working directory) python3 torch_test.py
And the corresponding python script:
$ torch_test.py
----------------------------------------------------------
import torch
if torch.cuda.is_available():
device = torch.device("cuda")
else:
device = torch.device("cpu")
print(f"Device = {device}")
Submitting this job should output “Device = cuda”, indicating that the GPU is available to do work in PyTorch.