NVIDIA GPU Management & Monitoring
Bright Cluster Manager® includes powerful GPU management and monitoring capabilities that leverage functionality in NVIDIA® Tesla™ GPUs to take maximum control of the GPUs and gain insight in their status and activity over time.
Bright also includes the necessary CUDA and OpenCL libraries.
Visualizing one or more GPU metrics in a graph is very easy with Bright Cluster Manager.
Bright Cluster Manager can sample and monitor metrics from supported GPUs and GPU Computing Systems, such as the Kepler-architecture NVIDIA Tesla K40 GPU accelerator as well as collections of GPU accelerators in a single chassis (e.g., GPU Units such as the Dell PowerEdge C410x PCIe Expansion Chassis).
Examples of supported metrics include:
- GPU temperatures;
- GPU exclusivity modes;
- GPU fan speeds;
- system fan speeds;
- PSU voltages and currents;
- system LED states;
- GPU ECC statistics (Fermi GPUs only).
See the table below for a complete overview of supported metrics and GPUs.
The frequency of metric sampling is fully configurable and so is the consolidation of the metrics data over time. Metrics data is stored in Bright Cluster Manager's central SQL database and can be visualized in value/time graphs, as well as in Bright Cluster Manager's unique Rackview.