Homegrown Cluster Management...Just because you can, doesn't mean you should Pt. 2

    

Picking up from last week, I illustrated a common dialogue between IT administrators and executive leadership concerning the decision to build an HPC cluster management solution. Now that the green light has been given to build, all of that money you “saved” by not using commercial cluster management software allowed you to buy more hardware to support jobs from end users, right?  You can see the extra servers on the floor, so you must be providing your users with more capacity for work, right?  Maybe not. Does the do-it-yourself approach you developed tell you precisely which system resources are actually being used by end users, and for which jobs?  Or, are users requesting more resources for their jobs than they really need and sitting on them (unused), preventing other users from gaining access to do real work?  And one more thing … how much server/system resource is being inefficiently consumed by the processes of your do-it-yourself cluster management solution at the expense of real work for users?  The point is, that cluster that appears to be 95% utilized is very likely to be far less productive than you think. 

But, maybe you’re OK with all of the staff expense associated with your “free” cluster management approach.  And perhaps you’re paying that staff enough to keep them securely in place, given how dependent you are on them.  And maybe ignorance is bliss in terms of not knowing how much net cluster resource is actually available to end users to perform real work. 

Assuming you are OK with all of that, there’s one more thing to consider: Is your cluster able to readily support the important new types of applications and technologies that users are asking for like machine learning, or do you have to do more development work to support new workloads (or perhaps stand up an entirely new cluster)?  The historical mindset of building a cluster management solution might have been acceptable in the static past, but it will increasingly fall flat as businesses maneuver to secure their future in an increasingly disruptive world.   Agility is derived from technology, and agility will determine which companies thrive, and which will die.  Linux clusters can provide the agility that businesses need to deliver innovation across a growing spectrum of data/compute-intensive applications, but they must be inherently agile themselves.  They must be capable of supporting a range of applications (HPC, machine learning, data analytics), running in various forms (bare metal, VMs, and containers), across all types of compute infrastructure (on-prem, public cloud and edge), and exist as a single shared and managed infrastructure … not a collection of silos.  Organizations that have such an infrastructure will have a competitive advantage over their peers.  Bright Cluster Manager can provide you with that type of infrastructure today.

All of this underlies the original point of this blog post: “Just because you can build your own cluster management solution, doesn’t mean you should.”  The task of building a cluster manager has never been easy, and the new realities mentioned above will make it even harder going forward.  Building your own solution takes valuable time, and yields zero strategic value to your business, and may very well cost you more in the end when considering staff costs associated with doing so, as well as failing to squeeze every ounce of utilization from your cluster’s resources.

Again, the decision to build or buy HPC cluster management is no longer a tactical one for businesses; it is a strategic one. If your company is looking for a legitimate, enterprise-class cluster management solution and you’ve considered a build vs. buy strategy, look no further. Bright Cluster Manager allows for HPC clusters to be easily created, monitored, and maintained using a single comprehensive user interface, maximizing administrator productivity and business agility. Contact us today at info@brightcomputing.com to learn more about how Bright Computing can help.