To extract the most value from your HPC cluster, you need to ensure that system resources are being properly utilized. HPC system users are notorious for over-requesting resources for their jobs, resulting in idle or underutilized resources that could otherwise be doing work for other jobs. While one reason for this can be users hoarding resources to ensure they have what they need, another common reason why users over request resources is that they simply don’t know what resources their jobs will need to complete the job in a specified time. For administrators to ensure that their precious and expensive cluster resources aren’t being squandered, they need to get actionable details regarding how the resources are being used. More specifically, they need to know things like which jobs are using which resources, which jobs aren’t using resources that they’ve provisioned and which users are repeatedly hoarding resources unnecessarily, as well as other things.Read More >
A significant change we’ve seen in the HPC landscape is the need to process data and run workloads at the edge. Last year, Bright delivered Bright Edge, which allows organizations to quickly and easily provision and manage servers at edge locations and enabled them to manage multiple locations as a single cluster. The new year brings with it Bright 9.0, which provides the ability to deploy and manage a workload manager (WLM) instance at each location.Read More >
Auto-scaling HPC and Kubernetes
Bright CEO Bill Wagner’s recent blog post “The Convergence of HPC and A.I.” pointed out that creating HPC and Kubernetes silos within a shared HPC infrastructure is more like “coexistence than convergence”. The solution I will be highlighting is Bright Auto-scaler, which automatically resizes HPC and Kubernetes clusters (workload engines) according to workload demand and configured policies. This post describes how to configure Bright Auto-scaler to achieve true convergence of HPC and A.I. through a scenario.Read More >
Each year we ask our customers to participate in a survey that helps us understand how they use and benefit from Bright Cluster Manager. We ask them a series of questions about the technologies they use, and how they use Bright in their day-to-day operations. And some of you had some very interesting things to say.
One of Bright Cluster Manager’s most popular features is its built-in monitoring system. It is lightweight and efficient, and it works right out of the box. But what people generally don’t know is that they can use Bright to monitor non-Bright nodes; nodes that were neither provisioned by, nor managed by, Bright. The Bright Lightweight CMDaemon can be used to monitor and healthcheck auxiliary servers that support the cluster but aren’t part of it, for example, database servers, authentication servers, and file servers.Read More >
Data center managers are continually under pressure to deliver more results from their existing computing resources. One strategy that has proven successful is the aggregation of what have traditionally been computing silos into a modern, and efficient shared compute cluster.Read More >
If you’re working in deep learning using NVIDIA GPUs, you may have heard about the NVIDIA GPU Cloud or NGC for short. NGC provides pre-integrated GPU-accelerated containers that you can use power your own AI projects.Read More >
This article describes how to configure jumbo frames in Bright Cluster Manager using the Cluster Management Shell (CMSH). Jumbo frames are enabled by changing the MTU (maximum Transmission Unit) on all the relevant switches and network interfaces. Bright makes it easy.