Picking up from last week, I illustrated a common dialogue between IT administrators and executive leadership concerning the decision to build an HPC cluster management solution. Now that the green light has been given to build, all of that money you “saved” by not using commercial cluster management software allowed you to buy more hardware to support jobs from end users, right? You can see the extra servers on the floor, so you must be providing your users with more capacity for work, right? Maybe not. Does the do-it-yourself approach you developed tell you precisely which system resources are actually being used by end users, and for which jobs? Or, are users requesting more resources for their jobs than they really need and sitting on them (unused), preventing other users from gaining access to do real work? And one more thing … how much server/system resource is being inefficiently consumed by the processes of your do-it-yourself cluster management solution at the expense of real work for users? The point is, that cluster that appears to be 95% utilized is very likely to be far less productive than you think.
Read More >