Managing the Impact of Interactive Use, Part 2: Interactive Workloads via Bright

    

Because the impact of unmanaged interactive sessions can be significant,[1] the concept of login nodes in Bright Cluster Manager was introduced in Part 1 of this series.[2] Although login nodes address many considerations relating to interactive use, they are designed to do so in a limited way. For example, in Part 1, the following consideration was outlined (emphasis added here):

"Usage constraints - As its name suggests, a login node presents as a user-facing access service. It does not offer other services, and this includes the ability to act as a compute node.[3] Because users will submit computational workloads from this login node, however, the login node will need to be incorporated as a `submit-only node’ within the context of the workload manager."

In other words, in designing and implementing a solution for interactive use based on login nodes, interactive execution of computational workloads was out of scope in Part 1 of this series.

Some users, however, will have a legitimate need for executing computational workloads interactively:

  • Desired - The end-user application directly supports a desired mode of use that is interactive. For example, significant computational workloads can be produced in each of the following cases: interactive MATLAB, Python, R or shell sessions; real-time visualization or computational steering; plus interactive use of in-house, community or commercial applications (with GUIs)
  • Required - The end-user application permits a required mode of use that is interactive. Debuggers and/or profilers, for example, can indirectly execute end-user applications using realistic computational workloads. In order for the end-user application to be debugged and/or profiled, interactive use of these developer-centric tools (via GUIs) is required. (Bright Cluster Manager includes the essential capabilities for debugging and profiling. Optionally, advanced solutions that interoperate well with Bright, are also available from Bright Computing’s partners.)

Motivated by the need to support the interactive execution of computational workloads, or interactive workloads, the second part in this series focuses on the use of WorkLoad Managers (WLMs) with Bright Cluster Manager. As the means for managing the interactive impact on the compute nodes of a cluster, Part 2 complements the introduction of login nodes that was the emphasis of Part 1.

Although the implementation specifics differ somewhat, all WLMs that interoperate with Bright Cluster Manager provide support for interactive workloads. In the simplest of cases, the desired or required need to execute computational workloads interactively is indicated through use of:

  • An option included with the job-submission command or script. For example, the "-I" option with "qsub" in Altair PBS Professional or "bsub" in IBM Platform LSF specifies the need for interactive execution. Additional options allow for interactive shell sessions and/or X11 support to be enabled.
  • A specialized job submission command. For example, in Univa Grid Engine, interactive workloads can be submitted through use of the "qsh" or "qrsh" commands. Support for an interactive shell session with or without X11 support is built into "qsh". As its name suggests, "qrsh" is implements an interactive remote shell in Univa Grid Engine.
  • The standard job submission command. For example, direct use of srun allows interactive workloads to be submitted via Slurm. For a detailed example involving a commercial CFD code, please consult “How do I run interactive Ansys Fluent using Slurm?” in the Bright Computing Knowledge Base.

Appropriate use of job submission commands (specialized or not, with or without tailored options) is necessary for executing computational workloads interactively. This use, however, will not guarantee that these workloads will actually be scheduled for interactive execution. In other words, scheduling attributes that permit interactive execution must also be present. This sufficient condition, for executing interactive workloads, is implemented in most of the WLMs that interoperate with Bright Cluster Manager. To summarize, permission to execute computational workloads interactively is handled through:

  • Queue-level scheduling attributes. For example, in Univa Grid Engine such queues possess "INTERACTIVE" in their "qtype" specification. In IBM Platform LSF, an analogous queue-level attribute for interactivity can be set from within Bright’s Cluster Management GUI (CMGUI) - please see “Interactive:” in the figure below.
  • Job-submission overrides. For example, in the case of Altair PBS Professional, a hook (i.e., a block of Python code) can be used to accept or reject computational workloads requesting interactive execution via "qsub -I". During this same pre-scheduling phase, job-submission parameters can be adjusted (e.g., redirection to a queue that supports interactive execution).  (Please consult Section 6.7.6 in the Altair PBS Professional 12.1 Administration Guide for additional details.) In the case of Slurm, partitions do not restrict the interactive execution of computational workloads through configuration specifics. As a consequence, to override this default behaviour, a plugin has been developed to ensure the existence of a batch flag upon job submission.
  • Queue-level scheduling attributes combined with job-submission overrides. For example, in IBM Platform LSF suppose that both "BATCHQ" and "INTERACTIVEQ" are configured as default queues (i.e., "DEFAULT_QUEUE=BATCHQ INTERACTIVEQ"). Additionally, "INTERACTIVE=NO" exists as a queue-level attribute for "BATCHQ", whereas "INTERACTIVE=YES" is specified for "INTERACTIVEQ". With this configuration in place, "bsub a.out" submits "a.out" to "BATCHQ", while "bsub -I a.out" is submitted to "INTERACTIVEQ". Again, this configuration can be implemented in IBM Platform LSF through use of Bright’s CMGUI.

Users can have legitimate needs for executing computational workloads interactively. Because login nodes are not designed to handle such workloads, there is a need to support interactivity using compute nodes managed by workload managers. Through use of appropriate submission parameters, and configuration of the workload manager, the execution of interactive workloads can be managed efficiently and effectively through Bright Cluster Manager.

Notes:

[1] Interactive sessions that are not controlled or regulated are termed unmanaged.

[2] In Part 1, a broader context for login nodes is provided. Interactive use is one of the considerations addressed by this user-facing access service.

[3] In some cases, however, a login node may need to serve as a proxy compute node. Using this proxy mechanism, login nodes accept and forward computational workloads to compute nodes for execution. This is the default configuration for Cray supercomputers, for example. In this case, workload is forwarded through use of API calls to the Cray Application Level Placement Scheduler (ALPS) application launch and schedule utility.

 

Acknowledgements: In addition to numerous discussions with his colleagues at Bright Computing, the author gratefully acknowledges the assistance of Scott Suchyta (Altair), Cameron Brunner (Univa Corp.), Bill Bryce (Univa Corp.), Bill McMillan (IBM Platform) and David Bigagli (SchedMD).

High Performance Computing eBook