By Dan Kuczkowski | March 21, 2019 | Bright Cluster Manager, machine learning, data science
Everyone in the computer business knows just how hot the machine learning (ML) space is today. The promise, as well as the demands, being placed on the AI data scientist by their companies, are numerous. For most of these people, their GPU laden computers used to run analysis are viewed as just a tool. Often, each data scientist is provided with their own powerful computer and they don’t want to be burdened with the need to operationalize these computers. Simply put, they just want to run their jobs as quickly as possible.
For most companies, there is a desire to provide their data scientists with powerful compute resources. However, as these computers scale in performance, especially those incorporating the latest GPUs, they are becoming extremely costly to procure and operate. Likewise, IT organizations dealing with large scale AI compute are often buried with managing these unique ML resources and staying up to date with all the evolving technologies. Thus, being good stewards of the company’s budget forces IT into the challenging position of staying current with this evolving technology while also maximizing the company’s compute resources.
Is there a life preserver that can be thrown to both the data scientists and IT administrators? I think so, and it may be surprising, but the characteristics displayed with high-performance computing (HPC) and their need for extreme compute requirements is similar. With that said, HPC users have had decades of experience fine-tuning and tweaking their powerful compute environments to drive efficiency and manageability into a broad range of workloads. What are some HPC capabilities that could also be leveraged by ML environments? Cluster management software for one.
Here are a few HPC capabilities that Bright software for Data Science offers that would liberate both IT and the AI data scientist: