With the start of 2020 only 3 months away, we’re now in the process of building out plans for the year ahead in order to hit the ground running on January 1st.
It’s easy to get caught up in the hype and excitement of new areas like machine learning and edge computing, especially when everyone is jumping on the bandwagon (almost willing it into reality). But hype almost always exceeds reality, and things never quite materialize as quickly as we thought (or hoped) they would. For tech companies that need to establish themselves in emerging markets, the trick is to invest enough to stay one or two steps ahead of customers, but not invest too far ahead of the curve in case things don’t materialize, or the road takes a hard left when you expected it to go right.
With that in mind, when I look at the investments we’re making in cluster management for HPC, machine learning and edge computing going forward, here are my observations about 2019 and what we expect in 2020:
Cluster Management for HPC
In 2019, we’ve seen the biggest shift in four years with commercial businesses moving away from a “build it with open source tools” mentality, to a “buy it” mentality. There are many reasons that orgs give for this shift, but the three key themes we hear repeatedly are 1) shortage and risk of skilled staff to build and maintain scripts that connect piecemeal provisioning and monitoring tools, 2) desire for a cluster management tool that’s efficient, easier to use and can help drive innovation faster, and 3) desire for flexibility to quickly incorporate new technologies and easily host new workloads. In 2020 I expect this trend to not only continue, but to accelerate, because the primary reasons for the shift aren’t going away, and orgs understand that they have to be smarter about their build vs. buy decisions. For most organizations, building a cluster management solution won’t help them compete more effectively, but using a commercial solution like Bright that enables them to innovate faster will. So the investments we are making to extend our value for HPC are in the areas of helping admins understand cluster resource utilization even better than before, extending our support for mixed hardware architectures and workload managers within the same cluster, streamlining our integration with public clouds, improving data management for hybrid cloud and cloud bursting and extending our scaling capabilities to more than 100,000 servers in a single cluster. These are the areas that we think will best serve our customers based on where we see things going.
Cluster Management for Machine Learning
Thus far in 2019, it remains early days for ML in commercial enterprises. A lot of the focus is still on developing and validating use cases with small-scale projects to demonstrate value to the business. As a result, from a technology perspective, the focus is on enabling data scientists to develop, train, and prove-out models that can eventually be put into production. I think that largely remains the case heading into the new year, but by mid-2020, I think we’ll begin to see a notable shift towards traditional production concerns related to scaling, reliability, and manageability of machine learning apps, particularly for training machine learning models. And what that means is that enterprises will need to have HPC-like systems with sufficient power and performance to support production deployments, that can also effectively be managed and maintained. But achieving scale also means cost-effectively supporting multiple data scientists developing and training compute-intensive machine learning models at the same time, so again, it begins to look very much like an HPC environment. Bright has already built the capabilities into Bright Cluster Manager to do this with things like our Jupyter integration, pre-tested machine learning packages and support for NGC containers, but we’ll be taking this further in 2020 based on feedback from customers.
Cluster Management for The Edge
Like machine learning, it remains early days for edge computing. We see a lot of early interest from customers as they develop edge use cases and strategies for their businesses, and whether it’s smart factory, smart city, smart retail or smart whatever, the focus right now is on proving-out concepts that will eventually go to full-scale production. When that happens, just like what will happen with machine learning, orgs will have to address things like scalability, reliability, and manageability. At face value, the similarities of edge compute and HPC cluster management might not be readily apparent. With both HPC and edge computing, servers need to be provisioned, managed, diagnosed, fault-isolated and monitored by an administrator, often in large quantities, and edge computing introduces a new set of physical challenges associated with their remote locations. More than a year ago, Bright Edge was added to our flagship Bright Cluster Manager product to allow organizations to remotely provision, configure, manage and monitor any number of distributed edge servers, from bare metal, from a central location through our existing administrative interface. And although we’ve already built resiliency into Bright Edge to ensure that edge servers can continue to run and process data if the network is lost, we’ll be introducing some exciting new features that address the challenges that organizations will face as they transition edge to full-scale production.
Cluster Management and the convergence of HPC, Machine Learning and Edge (infrastructure)
Today, HPC, machine learning and edge are typically tackled as distinct endeavors, but if you squint, you’ll see the lines beginning to blur. At Bright, we’re way beyond squinting … we’ve been busy building what we see as the inevitable end game. Reach out if you’d like to learn more.