Use the ‘sbatch’ command to submit your jobs:
[johnchris@l001 ~]$ sbatch strata_test.sl
sbatch: slurm_job_submit: Exclusive/Full Node jobs are billed for all cores and memory for all requested nodes.
Submitted batch job 2217698
scancel
Use the ‘scancel’ command to withdraw a job:
[johnchris@l001 ~]$ scancel 2217697
sacct
‘sacct’ displays accounting data for all jobs and job steps in the Slurm job accounting log or Slurm Database, providing both live and historic data.
‘sacct’ can track users’ recent jobs and display job IDs and other details:
[johnchris@l001 ~]$ sacct
JobID JobName Partition Account AllocCPUS State ExitCode
------------ --------- ---------- --------- ---------- ---------- --------
2217697 sample exclusive johnchris 48 CANCELLED+ 0:0
2217698 sample exclusive johnchris 48 PENDING 0:0
2217702 sample exclusive johnchris 48 PENDING 0:0
2217703 bash shared johnchris 4 RUNNING 0:0
2217703.ext+ extern johnchris 4 RUNNING 0:0
2217703.0 bash johnchris 4 RUNNING 0:0
‘sacct -j jobid’ will display only the specified job(s):
[johnchris@l001 ~]$ sacct -j 2193166
JobID JobName Partition Account AllocCPUS State ExitCode
------------ --------- ---------- --------- ---------- ---------- --------
2216924 sample shared johnchris 12 RUNNING 0:0
2216924.bat+ batch shared johnchris 12 RUNNING 0:0
2216924.ext+ extern shared johnchris 12 RUNNING 0:0
‘sacct’ can provide a detailed report on the memory, CPU, and other usage metrics of a job:
[johnchris@l001 ~]$ sacct -j 2412831 --format JobID,Elapsed,ReqMem,MaxRSS,AllocCPUs,TotalCPU,State
JobID Elapsed ReqMem MaxRSS AllocCPUS TotalCPU State
------------ -------- ------- ------- --------- -------- ---------
2412831 03:39:19 128Gn 128 128 00:00:00 RUNNING
2412831.bat+ 03:39:19 128Gn 32 128 00:00:00 RUNNING
2412831.ext+ 03:39:19 128Gn 128 128 00:00:00 RUNNING
2412831.0 03:38:50 128Gn 3 128 00:00:00 RUNNING
scontrol
‘scontrol show job jobid’ will show detailed information about the job with the specified ID, including the job script:
[johnchris@l001 ~]$ scontrol show job 2412831
JobId=2412831 JobName=amber_md
UserId=johnchris(12345678) GroupId=johnchris(12345678) MCS_label=N/A
Priority=24310 Nice=0 Account=drjohn QOS=normal
JobState=RUNNING Reason=None Dependency=(null)
Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
RunTime=03:55:29 TimeLimit=3-00:00:00 TimeMin=N/A
SubmitTime=2024-06-10T20:43:05 EligibleTime=2024-06-10T20:43:05
AccrueTime=2024-06-10T20:43:05
StartTime=2024-06-10T20:43:13 EndTime=2024-06-13T20:43:13 Deadline=N/A
SuspendTime=None SecsPreSuspend=0 LastSchedEval=2024-06-10T20:43:13
Partition=shared AllocNode:Sid=l001:2156446
ReqNodeList=(null) ExcNodeList=g[001-012]
NodeList=c[013-015,027]
BatchHost=c013
NumNodes=4 NumCPUs=128 NumTasks=128 CPUs/Task=N/A ReqB:S:C:T=0:0:*:*:*
TRES=cpu=128,mem=512G,node=4,billing=256
Socks/Node=* NtasksPerN:B:S:C=32:0:*:* CoreSpec=*
MinCPUsNode=32 MinMemoryNode=128G MinTmpDiskNode=0
Features=(null) DelayBoot=00:00:00
OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
Command=/mmfs1/data/johnchris/sample/test/example.slm
WorkDir=/mmfs1/data/johnchris/sample/test
StdErr=/mmfs1/data/johnchris/sample/test/slurm-2412831.out
StdIn=/dev/null
StdOut=/mmfs1/data/johnchris/sample/test/slurm-2412831.out
Power=