RFP Features

page_header_divider_line

The list of unique features below can be used to help customers to write their Request for Proposals (RfP) in a way that will get them the best possible cluster management software.

We propose that the customer adds all or some of the requirements below to their RfP. The section "WHY" could be added to the RfP, but could also be used for internal justification for adding the requirement to the RfP.

When providing the list below, use the print version, which is more printable and excludes the comments about competition.

Contents

  1. Linux

  2. GUI

  3. Command Line Interface

  4. Node Groups

  5. Parallel Shell

  6. Daemon and Database

  7. Software Image Management

  8. Software Provisioning — Basics

  9. Software Provisioning — Scalability

  10. Failover

  11. Monitoring — General

  12. Monitoring — Graphs

  13. Monitoring — Rackview

  14. Cluster Automation

  15. User Management

  16. Authentication and Authorization

  17. Workload Management

  18. GPU Computing

  19. HPC Environment

  20. Cluster Health Checking

  21. Cluster Installation

  22. Power Management

  23. Miscellaneous


Linux

  1. The underlying Linux distribution must be Red Hat Enterprise Linux (RHEL), SUSE Linux Enterprise Server (SLES), Scientific Linux, CentOS, or Ubuntu.

    WHY: These are the Linux distributions that we are familiar with and that our applications are known to work with.

COMPETITION: Rocks only supports CentOS. PCM probably supports all the above. xCAT really does support all of the above, including Ubuntu.

GUI

  1. All common management functionality (node control, node provisioning, monitoring, alarm setting, parallel shell, queuing system configuration, queue job control, user management, role-based authentication) must be available through the same single GUI.

    WHY:A single graphical user interface helps decrease the time required to learn a new product, makes it easier to locate information and control functions, as well as provides familiarity to new users. Having multiple GUIs that provide different functionality increases complexity for cluster administrators, and therefore TCO.

  2. Managed PDUs and ethernet switches must be configurable through the GUI.

    WHY: Ethernet switches and PDUs are an integral part of a turnkey cluster. A single interface to the turnkey cluster reduces cluster management complexity and therefore TCO.

  3. It must be possible to manage multiple clusters from the same GUI.

    WHY: Managing multiple clusters from the same cluster management interface, this greatly improves efficiency and therefore reduces TCO.

  4. It should be possible to run the cluster GUI from any Operating System. The Bright Cluster Manager GUI is available through any modern browser, on PCs, laptops, or tablets. 

    WHY: Some cluster administrators run Linux, some run Windows on their workstations and laptops. Having the freedom of choice and flexibility improves efficiency and therefore decreases TCO.

COMPETITION: xCAT does not have a GUI. Rocks can use Ganglia and Nagios, optionally. PCM uses Ganglia and Nagios. However, none of them support all the functionality listed above.

Command Line Interface

  1. All common management functionality (node control, node provisioning, monitoring, alarm setting, parallel shell, queuing system configuration, queue job control, user management, role-based authentication) must be available through the same single command line-based cluster management shell.

    WHY: Having one CLI, instead of multiple CLIs for different functionality, decreases complexity for cluster administrator(s) and therefore reduces TCO.

  2. The cluster management shell must support tab-based command completion, looping over objects, execution of system commands, piping into regular Linux shells such as bash, and execution in batch mode.

    WHY: A powerful, flexible cluster management shell is a huge time-saver for system administrators. One consistent CLI also reduces the learning curve for new system administrators.

COMPETITION: Rocks, PCM, and xCAT all use multiple different CLIs.

Node Groups

  1. It must be possible for the administrator to assemble nodes into groups in order to apply common cluster management commands (such as reboot, node provisioning, and power on/off, etc.).

    WHY: Being able to apply commands to groups of nodes greatly improves cluster manageability and therefore reduces TCO.

  2. The grouping of the nodes should be done centrally, i.e. only once for all relevant management commands.

    WHY: Being able to define groups centrally greatly improves cluster manageability and therefore reduces TCO.

  3. The grouping of the nodes should also be persistent, and not only for the duration of the execution of the command.

    WHY: Not having to redefine the group of nodes with every command execution greatly improves cluster manageability and therefore reduces TCO.

COMPETITION: Most competitors can define groups of nodes in some form or another. However, the groups are often not cluster-wide applicable, or they are not persistent.

Parallel Shell

  1. A parallel shell functionality must be provided which allows any Linux command to be executed on any combination of slave nodes. The commands must be executed in parallel and the output must be collected in parallel.

    WHY: A parallel shell is considered to be one of the most rudimentary cluster administrator tools. A very powerful yet flexible and easy-to-use parallel shell greatly improves manageability and therefore reduces TCO.

  2. To allow a convenient visualization of the output, it must be possible to group the output per node and in real-time, as the results come in.

    WHY: Parallel shell tools often only allow the output to be grouped after all the output has been received. This is inefficient, especially when the output is produced over an extended period of time.

COMPETITION: Rocks, PCM, and xCAT all use command-line based parallel shells, such as pdsh and psh, which do not allow visualization of output grouped by node in real-time.

Daemon and Database

  1. In order to save compute resources for computational work, a maximum of two management daemons is allowed: one for the queuing system, one for all other cluster management operations.

    WHY: Limiting the number of daemons used for cluster management reduces resource consumption and therefore reduces TCO.

  2. The cluster management daemon should consume less than 25MB of memory and less than 1 CPU-core minute per 3 days per compute node.

    WHY: Limiting the amount of compute and memory resources used, directly reduces waste of resources and therefore reduces TCO.

  3. All cluster configuration information and monitoring data must be stored in one, easy-to-backup database each.

    WHY: This increases efficiency, manageability, and reliability, and therefore reduces TCO.

COMPETITION: Because Rocks, PCM, and xCAT use the "toolkit" approach, they use different tools — each with their own daemon and database — for different functionality. For example, they use at least one daemon and database for Ganglia and another daemon and database for Nagios. xCAT runs its own daemon and database in addition to that.

Software Image Management

  1. It must be possible to create and use a (practically) unlimited number of software images for slave nodes.

    WHY: This increases flexibility, manageability and therefore reduces TCO.

  2. Software image updates should be possible without having to reboot nodes.

    WHY: Rebooting nodes means downtime and disruption which leads to increased TCO.

  3. It must be possible to do incremental software image updates. For example, if only a few files have changed, then only those files are distributed to all slave nodes.

    WHY: This increases the speed with which images are updated and it reduces the use of valuable network resources, therefore decreasing TCO.

  4. It must be possible to incorporate a change in the software on a running slave node into the software image on the provisioning node.

    WHY: This allows for changes to be made dynamically and live on the slave node before incorporating the changes into the image. It also prevents having to do the incorporation of the change manually. This improves efficiency and therefore reduces TCO.

  5. It should be possible to obtain updates to cluster management software through package management tools such as YUM, APT or zypper.

    WHY: This improves efficiency and therefore reduces TCO.

COMPETITION: Competition tends to provision using Kickstart or AutoYast. Changes to the software on the slave nodes tend to be made by copying files over to nodes, installing RPMs on slaves in parallel, or complete re-provisioning of a node. So their minimum level of granularity is an RPM, whereas in Bright minimum granularity is a file.

Software Provisioning — Basics

  1. Node identification should be based on switch-port detection. The master node should be able to use switch port information from the managed Ethernet switches to match MAC address to port numbers, to IP addresses and to node names. Replacing a node should not require updating MAC addresses manually in any place.

    WHY: This provides powerful information about the location of servers in the cluster, and it greatly simplifies the procedure of replacing or adding a node in the cluster and therefore reduces TCO.

  2. It must be possible to perform a full automatic upgrade of the BIOS or change the BIOS configuration on all slave nodes of the cluster without manual intervention on the slave nodes.

    WHY: This improves control over the cluster and therefore reduces TCO.

  3. It must be possible to provision the nodes with local hard disks (diskfull) and to nodes without local hard disk (diskless).

    WHY: Diskfull and diskless nodes both have advantages. Having the flexibility to use both improves efficiency and therefore reduces TCO.

  4. It must be possible to provision over InfiniBand.

    WHY: This is a must for InfiniBand-only clusters, which are more cost-effective than clusters that require Ethernet.

COMPETITION: xCAT does all the above. Rocks and PCM may be able to do some of the above.

Software Provisioning — Scalability

  1. It must be possible to do load-balanced software image provisioning from more than one provisioning node. If one provisioning node fails, the other should take over automatically.

    WHY: Being able to balance the load of the provisioning nodes allows scaling to very large clusters and it reduces the time required for image provisioning. It also provides redundancy and therefore reduces downtime. Both decrease TCO.

COMPETITION: Rocks apparently supports a form of peer-to-peer provisioning, but it does not work well. PCM's scalability is unknown. xCAT has a good reputation for scalability.

Failover

  1. It must be possible to have more than one head node such that a secondary head node can take over management of the cluster from the primary if and when required, for example when a hardware fault occurs on the primary head node.

    WHY: Failover capability reduces downtime and increases the risk of failure. It therefore reduces TCO.

  2. Jobs started through the workload management system should continue to run in the event of a failover.

    WHY: Crashing jobs as a result of a head node failover can be very expensive in a production environment.

  3. Failover capability must be a native, built-in feature that should not require third-party software tools such as Heartbeat or Red Hat Cluster Suite.

    WHY: Using a separate software tool for failover increases complexity for the system administrators and therefore TCO.

  4. It should be possible to clone a head node with minimal effort so that it can be used as a replacement for the original head node.

    WHY: This reduces recovery time, increases uptime, and therefore reduces TCO.

COMPETITION: Neither Rocks, nor PCM, nor xCAT has built-in support for failover head nodes. They all rely on third-party tools and tailor-made, custom configuration of failover capability.

Monitoring — General

  1. It must be possible to select which metrics are sampled and how often they are sampled.

    WHY: Being able to customize metrics sampling like this increases efficiency and reduces resource consumption of the monitoring system. It therefore contributes to reducing TCO.

  2. It must be possible to sample "custom metrics" from any piece of software or hardware attached to the cluster that is capable of making a value readable to a Linux system. Such "custom metrics" should subsequently be treated as any other, pre-defined metric.

    WHY: This powerful feature allows the system administrator to extend monitoring capability to applications and non-cluster hardware. It therefore increases the value and applicability of the cluster management software, and with that increases value-for-money and reduces TCO.

  3. It should be possible for custom metrics to sample multiple values at once (i.e. a single command should be able to provide values for multiple metrics.

    WHY: Having to probe a device once for each metric is inefficient and consumes CPU cycles, thereby increasing TCO.

  4. IPMI metrics should be sampled out of band. The IPMI driver may only be loaded while the BMC in a node is being configured. During normal node operation, the IPMI drivers should not be loaded.

    WHY: The IPMI driver consumes a lot of CPU cycles on slave nodes, which slows down compute jobs. Accessing the BMC out-of-band is much more efficient and therefore reduces TCO

  5. It must be possible to configure how monitoring data is consolidated over time, for each individual metric.

    WHY: This increases efficiency and reduces resource consumption (such as disk space and network capacity) of the monitoring system. It therefore reduces TCO.

  6. The monitoring functionality must be integrated with the alerts & alarms setting functionality, such that the same metrics and database that are used for monitoring are also used for setting thresholds, alarms and actions for those metrics.

    WHY: This highly improves efficiency and reduces complexity because the administrator(s) only have to learn and use one interface and make changes to settings only once. It also reduces resource consumption. All of this reduces TCO.

COMPETITION: Rocks, PCM, and xCAT all rely on Ganglia or Cacti for monitoring, which is not as configurable as Bright. They all rely on Nagios for system automation and setting alerts and alarms.

Monitoring — Graphs

  1. It must be possible to visualize more than one metric simultaneously in one time/value graph. For example, it must be possible to plot in one graph the CPU temperatures of node X, node Y, and node Z.

    WHY: This feature allows an easy comparison of the metrics between nodes, therefore enhancing insight into the health of the cluster.

  2. It must be possible to dynamically zoom in to a specific, user-selected area of a value/time metrics graph, for example by using a mouse to draw a rectangle over the area to be zoomed into.

    WHY: To find an interesting feature in a graph often requires zooming into the part of the graph that shows that feature. Having access to only "monthly, weekly, daily, etc. graphs is not enough.

  3. It should be possible to plot the total power usage of the cluster over time by extracting information from the PDUs.

    WHY: Insight into the power usage is essential for power management and power saving, thus reducing TCO.

COMPETITION: Rocks, PCM, and xCAT all rely on Ganglia or Cacti for monitoring, which does not have the above functionality.

Monitoring — Rackview

  1. The GUI must be able to visualize where each server and switch is located in the server racks ("Rackview").

    WHY: This feature greatly improves insight into the actual health of the cluster and reduces time-to-fix in case of errors. Furthermore, cluster administrator(s) can thus minimize their presence in the server room to maximize work efficiency and minimize health & safety and security exposure.

  2. The GUI must allow at least two metrics to be visualized in a "Rackview", for example using colors.

    WHY: This feature gives insight in the correlation between two metrics and greatly improves insight into the actual health of the cluster. It also reduces time-to-fix in case of errors. Furthermore, cluster administrator(s) can thus minimize their presence in the server room to maximize work efficiency and minimize health & safety and security exposure.

COMPETITION: Rocks, PCM, and xCAT do not have the above functionality.

Cluster Automation

  1. It must be possible to configure thresholds and actions for any metric available to the Linux operating system or IPMI tool. Any Linux command must be usable as an action.

    WHY: This improves ease-of-use and configurability of the cluster and therefore reduces TCO.

  2. It must be very easy to add new rules for actions and thresholds, for example by using a wizard.

    WHY: This improves ease-of-use and configurability of the cluster and therefore reduces TCO.

COMPETITION: Rocks, PCM, and xCAT all rely on Nagios for cluster automation. Nagios can configure thresholds and actions, but it not by far as easy to use as Bright and does not have an "add rules

User Management

  1. Adding, removing, and editing users of the cluster from a central location must be possible through the cluster management software.

    WHY: This improves efficiency and therefore reduces TCO.

  2. It must be possible to connect to an LDAP server or Active Directory server outside the cluster to retrieve user account information.

    WHY: This allows easy interaction with the local infrastructure and therefore increases efficiency and reduces TCO.

Authentication and Authorization

  1. Authentication to the cluster management infrastructure should be handled through X.509 certificates. In addition, all communication from outside the cluster to the cluster management infrastructure should be encrypted.

    WHY: Strong authentication and encryption are important for security reasons.

  2. It must be possible to define user profiles whereby each profile defines exactly what management features and commands are available to users with this profile.

    WHY: This allows different system administrators and users to have different permissions on the cluster and therefore increases security and efficiency, and reduces TCO.

COMPETITION: xCAT supports the above. Rocks and PCM probably not.

Workload Management

  1. The workload management system should allow for tight integration with the parallel middleware (e.g. MPI) runtime environment. This ensures that correct accounting, resource limits, and process control for parallel applications can be enforced.

    WHY: This increases the efficiency of the cluster and therefore reduces TCO.

  2. An application launcher with clean-up capabilities should be available to ensure that all processes are fully terminated when jobs finish or are killed.

    WHY: If applications are not launched with such a tool, "rogue" processes can consume compute and memory resources and can affect other "legitimate" processes, and thereby reduce the value of the cluster.

GPU Computing

  1. GPU units and GPUs should be manageable through the cluster management interface.

    WHY: This is essential for GPU clusters.

  2. All GPU metrics available from the NVIDIA DCGM must be sampled and available for visualization with graphs and Rackview.

    WHY: This is essential for GPU clusters.

HPC Environment

  1. At least the following HPC software must be included, configured and where applicable, cross-compiled:

    GNU, Intel, PGI, Open64 compilers

    Open MPI, MPICH, MPICH2, MVAPICH, MVAPICH2 (cross-compiled against the above compilers)

    Intel threading libraries, OpenMP, Global Arrays

    Intel MKL, LAPACK, architecture-optimized BLAS libraries (cross-compiled against the above compilers where applicable)

    WHY: The above software is required for many user applications. Having it pre-installed, configured and cross-compiled reduces work for the cluster administrator(s) and therefore reduces TCO.

  2. A fully configured "modules" (see https://modules.sourceforge.net) environment must be included.

    WHY:A "modules" environment greatly improves efficiency for the users and therefore reduces the value of the cluster to the users.


COMPETITION: xCAT does not provide user software. Rocks and PCM do to some extent.

Cluster Health Checking

  1. A flexible cluster health checking mechanism should be provided that checks the health of nodes just before a job is submitted to these nodes by the queuing system.

    WHY: Frequently checking the health of nodes improves job throughput and reduces downtime. This reduces TOC.

  2. A flexible cluster health checking mechanism should be provided that allows a configurable set of health checks to be run when nodes are not being used.

    WHY: Frequently checking the health of nodes improves job throughput and reduces downtime. This reduces TOC.

  3. A flexible cluster "burn-in" mechanism should be provided that does an intense and thorough health checking of hardware. Functionality should at least include health checks of CPUs, memory, hard disks and power supplies

    WHY: Frequently checking the health of nodes improves job throughput and reduces downtime. This reduces TOC.

COMPETITION: It is unclear if Rocks, PCM, or xCAT include any of the above functionality.

Cluster Installation

  1. It must be possible to do a cluster installation in graphical mode, not requiring the execution of command-line commands.

    WHY: This makes cluster installation easier and thus allows more junior and therefore lower-cost staff to install clusters.

  2. Nodes added to the cluster during initial installation or during later cluster expansion should automatically be subscribed to the workload management queues.

    WHY: This improves efficiency and thus reduces TCO.

  3. The cluster management software should take care of automatically configuring BMCs in the nodes.

    WHY: Configuring BMCs is a time consuming process, thus this reduces TCO.

  4. It should be possible to perform a remote head node installation by logging in to the DVD installation environment and driving the installation from remote.

    WHY: Being able to manage a cluster installation from remote greatly improves efficiency and can save expensive trips to remote locations. Furthermore, cluster administrator(s) can thus minimize their presence in the server room to maximize work efficiency and minimize health & safety and security exposure.

  5. The configuration of the cluster, such as network settings, hostname, node names, etc. should be easy to change after the installation.

    WHY: This improves efficiency and therefore reduces TCO.

COMPETITION: Rocks does not have a graphical installation mode and is notorious for requiring complete reinstallation when configurations need to be changed after installation.

Power Management

  1. It should be possible to manage power to nodes by making calls to PDUs to perform operations on the PDU port that a node is connected to.

    WHY: Automated power management improves efficiency and thus reduces TCO.

  2. It should be possible to define custom power methods for devices that cannot be power-managed through IPMI or PDU. For example, it should be possible to interact with a blade chassis management controller for performing power operations.

    WHY: Automated power management improves efficiency and thus reduces TCO

Miscellaneous

  1. The cluster management software must be Intel Cluster Ready (ICR) Certified.

    WHY: ICR certification of the management software ensures that ICR certified applications will run on the cluster without modification.

  2. All cluster management events happening on the cluster must be viewable in a single location that also allows the filtering of events for different types of events.

    WHY: This functionality improves insight into the health and activity of the cluster and therefore improves efficiency and reduces TCO.

  3. Disk layouts for slave nodes should allow regular disk partitions, software RAID arrays, LVM volumes, and combinations of those three.

    WHY: This adds flexibility and increases efficiency, thereby reducing TCO.

  4. A cluster management environment should give administrators the possibility to manage network interfaces of slave nodes from a single interface. Channel bonded interfaces, tagged VLAN interfaces, and alias interfaces should all be supported.

    WHY: This adds flexibility and increases efficiency, thereby reducing TCO.

  5. Cluster management environment should take care of generating DNS zones so that it is trivial to reach a particular interface of a node by name (e.g. node015.ipmi.cluster, node015.ib.cluster, node015.storage.cluster).

    WHY: This improves efficiency and thereby reduces TCO.

  6. The Linux kernel and initrd for slave nodes should always be loaded over the network on node boot-up time.

    WHY: If the Linux kernel and initrd are on the local disk, the cluster will not be able to start if a bad kernel is deployed. All nodes would have to be re-provisioned.

COMPETITION: xCAT is not listed as Intel Cluster Ready, Rocks and PCM are.