Bright Cluster Manager® offers a wide choice of workload managers, also known as queuing systems or queueing systems. Most leading workload managers are integrated into Bright Cluster Manager and many are even included on the Bright Cluster Manager DVD, either completely free of charge, or with a free, temporary evaluation license.
Benefits of Workload Manager Integration
Bright Cluster Manager is integrated with most leading workload managers. The integration exists on multiple levels, providing many powerful benefits to system administrators and users:
- Automatic installation — During the installation of Bright Cluster Manager, you can select from a list of all available workload managers. The selected workload manager will then automatically be installed in the right locations on the head node and regular node images.
- Automatic configuration — During the life time of the cluster, from installation to expansion to day-to-day management, Bright Cluster Manager inserts and updates relevant workload manager configurations.
Through the GUI it is possible to view and manipulate jobs, and to configure the workload manager without having to learn its specific configuration commands or files.
- Manageable from CMGUI and CMSH —
Through CMGUI and CMSH it is possible to view and manipulate jobs, and to configure the workload managerqueuing system or queueing system without having to learn its specific configuration commands or files. The following actions are available in CMGUI and CMSH for queues: add, remove, edit. The following actions are available in CMGUI and CMSH for jobs: show, remove, hold, release, suspend and resume.
- Viewable from the User Portal — In the User Portal, as user can see the status of the workload manager and a summary of relevant workload management statistics. He can also see his own jobs in the available queues.
- Manageable from the SOAP API — All actions and data available through CMGUI and CMSH — including workload manager related actions — are also available through the Bright Cluster Manager SOAP API and its C++, Python and PHP bindings.
- Failover managed by Bright — Bright Cluster Manager's built-in, native failover capability also manages the seamless failover of the workload manager.
- Workload manager statistics — Many workload manager metrics are sampled and analyzed by Bright Cluster Manager, over the life time of the cluster. Examples include number of completed, failed, queued and running jobs; estimated delay; and average job duration.
- Health checking — One of Bright's most powerful types of health check is the pre-job health check. This type checks the health of nodes just before a job is submitted to them. This ensures that the job does not crash due to node health problems (also called Black Hole Node Syndrome). This kind of health check is only possible when the cluster manager and the workload manager work closely together.
Integrated Workload Managers
The following workload managersqueuing systems or queueing systems are integrated in Bright Cluster Manager:
- Grid Engine — Grid Engine is a powerful workload managerqueuing systems or queueing systems which includes both queuing and scheduling functionality. Both Open Grid Scheduler and Univa Grid Engine are integrated in Bright Cluster Manager.
- LSF — LSF (Load Sharing Facility) is a commercial, proprietary workload manager from Platform Computing.
- Maui Cluster Scheduler — Maui is a powerful open source job scheduler which can provide scheduling intelligence to TORQUE.
- Moab Workload Manager — Moab is a powerful commercial job scheduler from Adaptive Computing which can provide scheduling intelligence to TORQUE and other workload managers.
- openlava — openlava is an open source fork of Platform Lava, which is based on Platform LSF™ version 4.2.
- PBS Professional — PBS Professional (Portable Batch System Professional) is a commercial, proprietary workload manager from Altair Engineering which includes both queuing and scheduling functionality. It was originally developed for NASA.
- SLURM — SLURM (Simple Linux Utility For Resource Management) is an open source resource manager with a plug-in architecture, used in many large installations at the US National Labs. It includes both queueing and scheduling functionality.
- TORQUE Resource Manager — TORQUE (Terascale Open-Source Resource and QUEue Manager) is an open source, distributed resource manager originally based on OpenPBS. It has limited scheduling intelligence built in, which is why it is usually used in combination with the Maui or Moab Cluster Schedulers.
Alternatively, you can easily install and configure other workload managersqueuings systems or queueing systems with Bright Cluster Manager, but you will not enjoy the benefits from the integration with Bright Cluster Manager.
Workload Management Features
All integrated workload managers offer at least the following features:
- Fairness policies — define what a cluster owner considers as a fair use of available resources.
- Advanced reservation — guarantees the availability of a set of resources at a particular time.
- Job priority policies and configurations — determines in which order jobs should be run to achieve some pre-defined fair-share policy.
- Quality of Services (QoS) support — allows jobs, users or groups to receive special treatment based on privileges and fairness policies.
- Multi-attribute fairshare — allows historical resource utilization information to be incorporated into job feasibility and priority decisions.
- Configurable node allocation policies — allow a site to specify how available resources should be allocated to each job.
- Multiple configurable backfill policies — allows a scheduler to make better use of available resources by running jobs out of order.
- System diagnostic support — provides commands for diagnosing system behavior.
- Allocation manager support and interface — manages resource allocations where a resource allocation grants a job the right to use a particular amount of resources (also known as allocation bank or CPU bank).
- Resource utilization tracking and statistics — provides extensive accounting facilities which allow resource usage to be tracked by resources (i.e., compute nodes), jobs, users, and other objects.
- Non-intrusive 'Test' modes — conducts scheduling cycles for testing as it would if running in normal or production mode, but without actually starting or modifying jobs.