By Andew O’Neill, Solutions Architect, Silicon Mechanics | November 17, 2020 | HPC
Every once in a while, you face a situation that you just never expected. That was the case for Silicon Mechanics earlier this year.
A client asked us to set up a smaller-scale high-performance computing (HPC) cluster based on a much larger cluster Silicon Mechanics designed for the national laboratory where the client worked.
Our task was to give him the same type of performance he was used to on the large cluster but on the smaller budget he was given for his specific project. This was an exciting design challenge, but it wasn’t the biggest hurdle we had to overcome. We also had to keep in mind that he was given a budget for a robust system but not a system administrator – so he would be filling that role himself!
That meant the new cluster had to be much easier to run than the national lab cluster he was used to accessing as a user. Cluster management had to come to the forefront of this design. Even the most massive computing power means nothing if it’s too hard to manage the hardware while also running your calculations.
Our first step was to suggest the cluster be provisioned with Bright Cluster Manager from Bright Computing.
The client had heard of Bright before but had not used it very much. We discussed it with him in depth and, when he was comfortable with the idea, we deployed the software. We thought Bright’s web-based graphical user interface, Bright View, was a much better fit for him than the traditional command line interface (CLI).
And, the intuitive software would make it easy for our client to understand how to provision nodes, install and update applications, setup and maintain separate node images and queues, which also helped simplify a complex Lustre deployment we had installed.
But we also had to help him get all the system administrator training he needed to make his deployment successful.
The Silicon Mechanics team had mapped out a hands-on training program for our new system administrator. However, that plan became impossible because of travel restrictions related to the onset of COVID-19. Instead, we suggested a rigorous online-only training program. The client agreed and the engineering team quickly created an online-only curriculum that replicated the situations a systems administrator would encounter on this specific system.
The customer embraced the online curriculum and ended up being very successful in mastering the training. It helped that the Silicon Mechanics engineering group who setup and configured the cluster prior to shipment also designed and conducted our online training. The team lead made sure that the training not only reflected the hardware design but also the needs of a new system administrator.
You can read the full story on our website, www.SiliconMechanics.com, but one thing is for sure – this deployment would have had little chance of success if we had not had cluster management software like Bright Computing to turn to.
Bright recently recorded a podcast interview between Jack Hanna, Director Alliances at Bright, and myself. You can listen to the podcast in the following ways: