How submit an interactive job using Slurm

    

Here is an easy way to submit an interactive job to Slurm, using srun. You can even submit a job to a cloud node this way.Slurm

Let's get started.


 

Here are some common examples of srun usage.

$ srun sleep 30
srun: Job step created

The job was submitted but where is it running? Use the scontrol command to get detailed job output. First use squeue to get the jobid.

$ squeue
  JOBID PARTITION     NAME     USER  ST       TIME  NODES NODELIST(REASON)
   1877      defq    sleep  rstober   R       0:06      1 atom01

The squeue output shows that the job id is 1877. The scontrol output shows detailed job output. For example the NodeList parameters shows that the job is running on the host "atom01".

$ scontrol show job 1877
JobId=1877 Name=sleep
   UserId=rstober(1001) GroupId=rstober(1001)
   Priority=4294901754 Account=(null) QOS=normal
   JobState=RUNNING Reason=None Dependency=(null)
   Requeue=1 Restarts=0 BatchFlag=0 ExitCode=0:0
   RunTime=00:00:15 TimeLimit=UNLIMITED TimeMin=N/A
   SubmitTime=2012-06-28T16:52:10 EligibleTime=2012-06-28T16:52:10

StartTime=2012-06-28T16:52:10 EndTime=Unknown
   PreemptTime=None SuspendTime=None SecsPreSuspend=0
   Partition=defq AllocNode:Sid=atom-head1:8298
   ReqNodeList=(null) ExcNodeList=(null)
   NodeList=atom01
   BatchHost=atom01
   NumNodes=1 NumCPUs=2 CPUs/Task=1 ReqS:C:T=*:*:*
   MinCPUsNode=1 MinMemoryNode=0 MinTmpDiskNode=0
   Features=(null) Gres=(null) Reservation=(null)
   Shared=0 Contiguous=0 Licenses=(null) Network=(null)
   Command=/bin/sleep
   WorkDir=/home/rstober

Let's specify the cloud partition this time. This job will run on a cloud node.

$ srun -p cloud sleep 30
srun: Job step created

To direct the job output to a file specify the -o filename option. This job will execute the hostname command, and SLURM will save the jobs STDOUT and STDERR in the file "hostname.out" in the current directory

$ srun -p defq -o hostname.out hostname
srun: Job step created

The contents of the hostname.out file show that this job also ran on atom01

$ cat hostname.out
atom01

simple management of complex resources