Vendors put all of their chips on the table at SC19

    

One of the key things that solidified for me coming out of the Supercomputing 2019 conference in Denver this year is how different high-performance computing infrastructure is going to look in a few years. Here are just a handful of the many chip-related announcements at SC that foreshadow what the future holds:

  • Intel announced their “XPU” strategy, inclusive of GPUs, CPUs, FPGAs and NNPs
  • On the heels of its Arm A64FX processor announcement, Fujitsu announced its own PRIMEHPC Arm-based supercomputers
  • NVIDIA announced an Arm-based reference design for GPU-accelerated Arm servers
  • Amazon AWS announced new EC2 compute instances based on the AMD EPYC processor
  • Cray and Fujitsu announced a collaboration to develop a commercial supercomputer based on the Fujitsu A64FX processor
  • Microsoft announced the availability of Graphcore IPU accelerators in the Azure cloud
  • Penguin announced new Altus servers that use AMD Radeon Instinct accelerators with AMD EPYC CPUs

Adding to the growing chip diversity that’s unfolding in HPC, we also have an evolving landscape in the area of workloads being deployed on high-performance Linux systems:

  • HPC/AI convergence in terms of shared infrastructure and combined workflows
  • Growing use of public clouds for HPC
  • Machine learning applications graduating from POC to production
  • Growing use of containerized workloads
  • IoT and 5G applications coupled with the build-out of edge compute

Taking all of this together, it’s pretty clear that we’re in for a wild ride.  But all of this begs the question…how will organizations deploy and manage all of this new and different technology?  Each technology offers distinct value and advantage for different use cases that organizations will rightly want to take advantage of, and so they will, but probably independent of each other.  Fast forward a few years, and I envision lots of loosely coupled, siloed systems that together will make a grown system administrator cry. But perhaps worse than that will be the consequences of cost – both monetary and opportunity – for single-purpose systems that are independently purchased and maintained.

The key thing that compelled me to join Bright Computing nearly four years ago was the breadth of capabilities in the product (Bright Cluster Manager) that makes building and managing Linux clusters easy, and a general sense that these capabilities would become immensely valuable as the market evolves. One of the truly unique capabilities that Bright Cluster Manager offers is the ability to create a cluster that combines different chip architectures and operating systems in the same centrally managed cluster. That same cluster can extend to AWS or Azure for additional resources when necessary. That same cluster can deploy and manage edge servers in remote locations where skilled IT staff may not be available. That same cluster can simultaneously host HPC workload managers and Kubernetes and share compute resources between them based on workload demand. And that’s just the tip of the iceberg.

So, when I think about what I saw unfolding at SC19 and think about what our product can do, I can’t help but smile.