Your production HPC cluster has been humming along happily for some time now. Why? In part, it’s because you have a well-established software image that is used by all of the compute nodes in your cluster. You created this software image, tested it and ultimately deployed it into production. All is well - until now.
Whether it’s motivated by internal or external forces, change is inevitable. In some cases, there may be a pressing need to respond rapidly - for example, patching operating systems to reduce vulnerability in the case of a security exploit. In other cases, change may be the outcome of a process that requires upgrades to application software. Even though changes like these don’t require hardware neurosurgery, they can be the cause of serious concern. Let’s face it, there are entire frameworks like ITIL to manage processes like change.
If you were dealing with source code, and had concerns about change, use of a traditional revision control system like SVN, GIT, CVS or RCS would be a no-brainer solution. As wonderful as these tools are in the context of managing source code, they were never designed to be applied to software images. Why? These tools were never designed to handle the nuances of entire file systems - from ownerships, permissions and types like /dev to scales on the order of multiple GBs of data.
Using revision control for source code as our guide, however, sysadmins might appreciate a capability that addresses similar requirements … a capability that allows them to:
- Register a documented revision of a software image after making changes
- Obtain an overview of all registered revisions of a software image along with documented changes
- Select a registered revision of a software image for use by a set of nodes
- Purge registered revisions that are no longer required (e.g., to free-up storage) while preserving their documentation
- Revert to a specified registered revision - a process that discards the active software image and replaces it with a known registered revision
The next-generation of block-device filesystems are introducing a snapshot capability into Linux. Use of this capability to provide revision control for software images is an increasingly attractive possibility. Of course, there is a need to provide a revision-control capability in contexts where a snapshot service is not yet available.
Bright Cluster Manager will provide a revision-control capability for software images. In cases where btrfs is available, Bright makes use of snapshots in delivering this new feature. However, in cases where btrfs is not supported, Bright is still able to provide this important management capability. In the Bright case, revisions of software images can be managed using Bright’s GUI or command-line interface - in fact, there will be standalone commands that don’t even require use of Bright’s interactive shell for cluster management.
Bright Cluster Manager allows nodes to be logically organized into categories - e.g., there is likely to be a compute-nodes category. By associating software images with categories of nodes, Bright provides an extremely efficient associative mechanism that ensures all compute nodes are using the same software image. Bright admins will be able to synchronize categories of nodes to a specific revision of a software image or to the latest revision.
Because Bright Cluster Manager provisions nodes using a flexible and scalable approach, sysadmins can introduce modified software images in a highly controlled way. Should the need present itself, changes can be backed out just as expediently.
Revision control for software images will appear for the first time with the release of version 7 of Bright Cluster Manager for HPC. This capability will also be available upon release for Bright Cluster Manager for Apache Hadoop as well as Bright Cluster Manager for OpenStack. Because enterprises need to manage change, we expect this’ll be a very welcome addition to our unified management solution for clusters and clouds of all types.