If you’re working in deep learning using NVIDIA GPUs, you may have heard about the NVIDIA GPU Cloud or NGC for short. NGC provides pre-integrated GPU-accelerated containers that you can use power your own AI projects.Read More >
There are times when you need to customize the operating system that servers in your cluster are running in order to enable some special capability. One example is enabling a server to take advantage of accelerator or GPU hardware by installing a kernel module. Bright does this automatically for some of the most popular devices, like NVIDIA GPUs, but you may have other needs. For example, kernel modules often are required to control particular storage, network, or other devices.
Carrying out such low-level modifications on a server can be risky. If done incorrectly, kernel modifications could cause the server to fail. The risk is compounded in a clustered environment since different servers may need different kernel modules to load every time they are restarted.
Fortunately, Bright Cluster Manager can minimize the risk out of installing kernel modules, and automate the process so that the cluster’s operation remains consistent.
Here’s how it’s done:
Read More >
This article describes how to configure jumbo frames in Bright Cluster Manager using the Cluster Management Shell (CMSH). Jumbo frames are enabled by changing the MTU (maximum Transmission Unit) on all the relevant switches and network interfaces. Bright makes it easy.
This article show how you can easily manage Slurm jobs using the Bright Cluster Management Shell (CMSH). In job mode, the CMSH allows you to perform the same job management operations as the CMGUI through a convenient shell interface. For an example of managing jobs using the Bright CMGUI, check out my previous article on this topic.
The Bright Cluster Manager CMGUI makes tasks intuitively easy. This article shows how you can view and control workload manager jobs using the Bright CMGUI. I am using an OGS (SGE) job to provide examples, but Bright works the same way with all Bright's supported workload managers: PBS Professional, Slurm, Univa Grid Engine, LSF, openlava, TORQUE/Moab, TORQUE/Maui.
This article describes basic Slurm usage for Linux clusters. Brief "how-to" topics include, in this order:
- A simple Slurm job script
- Submit the job
- List jobs
- Get job details
- Suspend a job (root only)
- Resume a job (root only)
- Kill a job
- Hold a job
- Release a job
- List partitions
- Submit a job that's dependant on a prerequisite job being completed