The future of high-performance computing infrastructure can be summed up in one word: “Different”


At the beginning of each new year, many articles are written about what lies ahead in the coming year, especially in the fast-changing, ever-evolving area of tech.  When the new year is also the beginning of a new decade, the number of articles multiply accordingly.  

Until a few years ago, high-performance computing wasn’t changing or evolving at the same brisk pace as less mature areas.  By my estimation, that began to change in 2016 when things started to gel for commercial enterprises around machine learning, HPC in the cloud, and increased use of accelerators for compute-intensive workloads (NVIDIA GPUs in particular).  Since then, we’ve also seen increasing momentum with containers & Kubernetes that will eventually become prominent in HPC, a realization that compute will need to extend to “the edge” for IoT to feed AI and HPC models, and an increasing number of new chips from the likes of Arm and AMD that will diversify the HPC landscape beyond its Intel-centric history.  

Taken together, the sum of these changes paint a future for HPC infrastructure that looks very different from its past, and those differences will have implications for how HPC clusters will be built and managed going forward.  In the past, HPC clusters have been built with a fairly heterogeneous and static mindset. The notion of combining X86 and Arm architectures in the same cluster was not the domain of mere mortals.  Extending your HPC cluster to the public cloud for additional capacity was something you planned to do down the road. Hosting containerized machine learning applications and data analytics applications on your HPC cluster harmoniously alongside traditional MPI-based modeling & simulation applications was on your wish list.  Offering end users bare metal, VMs, and containers on the same cluster was a pipedream. Deploying edge compute in a way that makes it integral to your core HPC infrastructure rather than a red-headed stepchild (no offense to the gingers out there), falls under the category of “maybe someday.”  

Yet somehow, the pressure is on to make all of these things happen now. Right now.  And, while “different” may be the one word that sums up the future of HPC, “complexity” is the one word that woefully understates the reality of what organizations now face in getting this done.  

In the bygone days of traditional HPC, building custom scripts to integrate a collection of different open source tools for server provisioning (e.g., xCAT), monitoring (e.g., Ganglia), alerts (e.g., Nagios) and change management was difficult and labor-intensive, but possible for organizations with the time and skills to do so.  But, in the emerging new world of HPC, where all of the new realities mentioned above must be fulfilled quickly, the old DIY way will put a chokehold on your business.

