By Robert Stober | February 18, 2013 | Slurm, GPU, Linux Cluster
Updated January 2021
This article shows you how to submit a simple Slurm GPU job to your local cluster using Bright Cluster Manager. Dead easy.
Let's get started.
First, load the Slurm environment module.[rstober@atom-head1 local]$ module load slurm
Here's the jobscript we're going to submit to Slurm. This job requires two GPUs, and it will run instance of the executable on each.
[rstober@atom-head1 local]$ cat slurm-gpu-job.sh
#!/bin/sh
#SBATCH -o slurm-gpu-job.out
#SBATCH -p defq
#SBATCH --gres=gpu:2
module load cuda42/toolkit
srun --gres=gpu:1 /home/rstober/OEM_Toolkit_Linux_64/bin/run_oem_binom.sh 1000
srun --gres=gpu:1 /home/rstober/OEM_Toolkit_Linux_64/bin/run_oem_binom.sh 1000
wait
Next, submit the job using the sbatch command.
[rstober@atom-head1 local]$ sbatch slurm-gpu-job.sh
Submitted batch job 132
The job is running.
[rstober@atom-head1 local]$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
132 defq slurm-gp rstober R 0:03 1 atom04
Use the scontrol command to get detailed information about the job.
[rstober@atom-head1 local]$ scontrol show job 132
JobId=132 Name=slurm-gpu-job.sh
UserId=rstober(1001) GroupId=rstober(1001)
Priority=4294901722 Account=(null) QOS=normal
JobState=RUNNING Reason=None Dependency=(null)
Requeue=1 Restarts=0 BatchFlag=1 ExitCode=0:0
RunTime=00:00:18 TimeLimit=UNLIMITED TimeMin=N/A
SubmitTime=2013-01-24T07:24:41 EligibleTime=2013-01-24T07:24:41
StartTime=2013-01-24T07:24:41 EndTime=Unknown
PreemptTime=None SuspendTime=None SecsPreSuspend=0
Partition=defq AllocNode:Sid=atom-head1:17343
ReqNodeList=(null) ExcNodeList=(null)
NodeList=atom04
BatchHost=atom04
NumNodes=1 NumCPUs=4 CPUs/Task=1 ReqS:C:T=*:*:*
MinCPUsNode=1 MinMemoryNode=0 MinTmpDiskNode=0
Features=(null) Gres=gpu:2 Reservation=(null)
Shared=0 Contiguous=0 Licenses=(null) Network=(null)
Command=/home/rstober/slurm/local/slurm-gpu-job.sh
WorkDir=/home/rstober/slurm/local
Home free.
Looking for more guidance? Read this post on basic Slurm usage for Linux clusters.