To extract the most value from your HPC cluster, you need to ensure that system resources are being properly utilized. HPC system users are notorious for over-requesting resources for their jobs, resulting in idle or underutilized resources that could otherwise be doing work for other jobs. While one reason for this can be users hoarding resources to ensure they have what they need, another common reason why users over request resources is that they simply don’t know what resources their jobs will need to complete the job in a specified time. For administrators to ensure that their precious and expensive cluster resources aren’t being squandered, they need to get actionable details regarding how the resources are being used. More specifically, they need to know things like which jobs are using which resources, which jobs aren’t using resources that they’ve provisioned and which users are repeatedly hoarding resources unnecessarily, as well as other things.
Read More >