How to manage Slurm jobs using the Bright Cluster Management Shell (CMSH)

    

This article show how you can easily manage Slurm jobs using the Bright Cluster Management Shell (CMSH). In job mode, the CMSH allows you to perform the same job management operations as the CMGUI through a convenient shell interface. For an example of managing jobs using the Bright CMGUI, check out my previous article on this topic.

Slurm is available as a admin-selectable, pre-configured option in Bright Cluster Manager. Other integrated workload managers include PBS Professional, LSF, Univa Grid Engine, OGS, TORQUE (with Maui or Moab) and openlava (open source LSF). More information here.

Now back to managing Slurm jobs. Let's get started.


First, Start the CMSH, enter jobs mode, and set the working scheduler to Slurm.

[root@atom-head1 ~]# cmsh
[atom-head1]% jobs
[atom-head1->jobs]% scheduler slurm
Working scheduler is slurm

The CMSH provides an integrated help system that shows the sub-commands that are available. Execute the CMSH and enter jobs mode.

[atom-head1->jobs(slurm)]% ?
...
[output omitted]
...
================================= Jobs =================================
format ........................ Modify or view current list format
hold .......................... Hold a job
list .......................... List overview
release ....................... Release a job
remove ........................ Remove a job
resume ........................ Resume a job
scheduler ..................... Display/set the default scheduler
show .......................... Show job information
suspend ....................... Suspend a job

List the jobs using the list command.

[atom-head1->jobs(slurm)]% list
Type         Job ID       User         Queue           Status   Nodes
------------ ------------ ------------ --------------- -------- ----------------------------------------
Slurm        111          rstober      cloud           RUNNING  cnode1
Slurm        112          rstober      cloudtransfers  PENDING  (Dependency)

Show the details for a specific job.

[atom-head1->jobs(slurm)]% show 111
Parameter                      Value
------------------------------ ----------------------------------------
Arguments
Executable
In queue
Job ID                         111
Job name                       tmpDSDJ7C.job
Mail list
Mail options
Maximum wallclock time         UNLIMITED
Memory usage                   0B
Nodes                          cnode1
Notify by mail                 no
Number of processes            1
Priority
Project
Queue                          cloud
Revision
Run directory
Running time                   30
Start time
Status                         RUNNING
Stderr file
Stdin file
Stdout file
Submission time
Type                           Slurm
User                           rstober

You can suspend a running job using the suspend command.

[atom-head1->jobs(slurm)]% suspend 111

[atom-head1->jobs(slurm)]% list
Type         Job ID       User         Queue           Status   Nodes
------------ ------------ ------------ --------------- -------- ----------------------------------------
Slurm        111          rstober      cloud           SUSPEND+ cnode1
Slurm        112          rstober      cloudtransfers  PENDING  (Dependency)

You can then resume the job using the resume command.

[atom-head1->jobs(slurm)]% resume 111

[atom-head1->jobs(slurm)]% list
Type         Job ID       User         Queue           Status   Nodes
------------ ------------ ------------ --------------- -------- ----------------------------------------
Slurm        111          rstober      cloud           RUNNING  cnode1
Slurm        112          rstober      cloudtransfers  PENDING  (Dependency)

Or, kill a job using the remove command.

[atom-head1->jobs(slurm)]% remove 111

[atom-head1->jobs(slurm)]% list
Type         Job ID       User         Queue           Status   Nodes
------------ ------------ ------------ --------------- -------- ----------------------------------------
Slurm        112          rstober      cloudtransfers  RUNNING  atom-head1

 

How easy is that?

High Performance Computing eBook