By Robert Stober | March 20, 2013 | workload manager, Slurm, Job Scheduler, Linux Cluster, open source scheduler
This article show how you can easily manage Slurm jobs using the Bright Cluster Management Shell (CMSH). In job mode, the CMSH allows you to perform the same job management operations as the CMGUI through a convenient shell interface. For an example of managing jobs using the Bright CMGUI, check out my previous article on this topic.
Slurm is available as a admin-selectable, pre-configured option in Bright Cluster Manager. Other integrated workload managers include PBS Professional, LSF, Univa Grid Engine, OGS, TORQUE (with Maui or Moab) and openlava (open source LSF). More information here.
Now back to managing Slurm jobs. Let's get started.
First, Start the CMSH, enter jobs mode, and set the working scheduler to Slurm.
[root@atom-head1 ~]# cmsh
[atom-head1]% jobs
[atom-head1->jobs]% scheduler slurm
Working scheduler is slurm
The CMSH provides an integrated help system that shows the sub-commands that are available. Execute the CMSH and enter jobs mode.
[atom-head1->jobs(slurm)]% ?
...
[output omitted]
...
================================= Jobs =================================
format ........................ Modify or view current list format
hold .......................... Hold a job
list .......................... List overview
release ....................... Release a job
remove ........................ Remove a job
resume ........................ Resume a job
scheduler ..................... Display/set the default scheduler
show .......................... Show job information
suspend ....................... Suspend a job
List the jobs using the list command.
[atom-head1->jobs(slurm)]% list
Type Job ID User Queue Status Nodes
------------ ------------ ------------ --------------- -------- ----------------------------------------
Slurm 111 rstober cloud RUNNING cnode1
Slurm 112 rstober cloudtransfers PENDING (Dependency)
Show the details for a specific job.
[atom-head1->jobs(slurm)]% show 111
Parameter Value
------------------------------ ----------------------------------------
Arguments
Executable
In queue
Job ID 111
Job name tmpDSDJ7C.job
Mail list
Mail options
Maximum wallclock time UNLIMITED
Memory usage 0B
Nodes cnode1
Notify by mail no
Number of processes 1
Priority
Project
Queue cloud
Revision
Run directory
Running time 30
Start time
Status RUNNING
Stderr file
Stdin file
Stdout file
Submission time
Type Slurm
User rstober
You can suspend a running job using the suspend command.
[atom-head1->jobs(slurm)]% suspend 111
[atom-head1->jobs(slurm)]% list
Type Job ID User Queue Status Nodes
------------ ------------ ------------ --------------- -------- ----------------------------------------
Slurm 111 rstober cloud SUSPEND+ cnode1
Slurm 112 rstober cloudtransfers PENDING (Dependency)
You can then resume the job using the resume command.
[atom-head1->jobs(slurm)]% resume 111
[atom-head1->jobs(slurm)]% list
Type Job ID User Queue Status Nodes
------------ ------------ ------------ --------------- -------- ----------------------------------------
Slurm 111 rstober cloud RUNNING cnode1
Slurm 112 rstober cloudtransfers PENDING (Dependency)
Or, kill a job using the remove command.
[atom-head1->jobs(slurm)]% remove 111
[atom-head1->jobs(slurm)]% list
Type Job ID User Queue Status Nodes
------------ ------------ ------------ --------------- -------- ----------------------------------------
Slurm 112 rstober cloudtransfers RUNNING atom-head1
How easy is that?