How to Submit a Simple Slurm GPU job to your Linux cluster

    

This article shows you how to submit a simple Slurm GPU job to your local cluster using Bright Cluster Manager. Dead easy.

Let's get started.

First, load the Slurm environment module.
 

[rstober@atom-head1 local]$ module load slurm


Here's the jobscript we're going to submit to Slurm. This job requires two GPUs, and it will run instance of the executable on each.

[rstober@atom-head1 local]$ cat slurm-gpu-job.sh
#!/bin/sh

#SBATCH -o slurm-gpu-job.out
#SBATCH -p defq
#SBATCH --gres=gpu:2

module load cuda42/toolkit
srun --gres=gpu:1 /home/rstober/OEM_Toolkit_Linux_64/bin/run_oem_binom.sh 1000
srun --gres=gpu:1 /home/rstober/OEM_Toolkit_Linux_64/bin/run_oem_binom.sh 1000
wait

Next, submit the job using the sbatch command.

[rstober@atom-head1 local]$ sbatch slurm-gpu-job.sh
Submitted batch job 132

The job is running.

[rstober@atom-head1 local]$ squeue
  JOBID PARTITION     NAME     USER  ST       TIME  NODES NODELIST(REASON)
    132      defq slurm-gp  rstober   R       0:03      1 atom04

Use the scontrol command to get detailed information about the job.

[rstober@atom-head1 local]$ scontrol show job 132
JobId=132 Name=slurm-gpu-job.sh
   UserId=rstober(1001) GroupId=rstober(1001)
   Priority=4294901722 Account=(null) QOS=normal
   JobState=RUNNING Reason=None Dependency=(null)
   Requeue=1 Restarts=0 BatchFlag=1 ExitCode=0:0
   RunTime=00:00:18 TimeLimit=UNLIMITED TimeMin=N/A
   SubmitTime=2013-01-24T07:24:41 EligibleTime=2013-01-24T07:24:41
   StartTime=2013-01-24T07:24:41 EndTime=Unknown
   PreemptTime=None SuspendTime=None SecsPreSuspend=0
   Partition=defq AllocNode:Sid=atom-head1:17343
   ReqNodeList=(null) ExcNodeList=(null)
   NodeList=atom04
   BatchHost=atom04
   NumNodes=1 NumCPUs=4 CPUs/Task=1 ReqS:C:T=*:*:*
   MinCPUsNode=1 MinMemoryNode=0 MinTmpDiskNode=0
   Features=(null) Gres=gpu:2 Reservation=(null)
   Shared=0 Contiguous=0 Licenses=(null) Network=(null)
   Command=/home/rstober/slurm/local/slurm-gpu-job.sh
   WorkDir=/home/rstober/slurm/local

Home free.

Looking for more guidance? Read this post on basic Slurm usage for Linux clusters.

simple management of complex resources