Bright Computing Logo

Bright Cluster Manager - Linux Cluster Monitoring

Home > Products > Cluster Monitoring

Monitoring

With Bright Cluster Manager® a comprehensive set of hardware and software metrics can be monitored, visualized and analyzed in a variety of ways. Virtually all software and hardware metrics available to the Linux kernel and all hardware metrics available to hardware management interfaces, such as IPMIThe Intelligent Platform Management Interface (IPMI) specification defines a set of common interfaces to a computer system which system administrators can use to monitor system health and manage the system., are available.


Bright Cluster Manager ScreenshotBright Cluster Manager Screenshot

Available Metrics

The metrics available by default on an cluster can be categorized into two main categories:

  1. Cluster Metrics — These are metrics for the cluster as a whole, often summed or averaged over all regular nodes.
  2. Device Metrics — These are metrics for one individual node, such as a compute node, provisioning node, login node, or another type of node or device.

For each of the above categories, the following subcategories are available:

  1. CPU — Examples of metrics: speed, idle time, user time, system time, wait time.
  2. Disk — Examples of metrics: free space, used space, I/O performance, SMARTSelf-Monitoring, Analysis, and Reporting Technology (SMART) is a monitoring system for computer hard disks to detect and report on various indicators of reliability, in the hope of anticipating failures. data.
  3. Memory — Examples of metrics: free memory, used memory, free swap space, used swap space, buffer memory, cache memory.
  4. Network — Examples of metrics: bytes sent/received, IP/TCP/UDP errors.
  5. Environmental — Examples of metrics: temperatures, fan speeds.
  6. Operating System — Examples of metrics: forks, load average, process count, running processes, uptime.
  7. Workload — Examples of metrics: running jobs, queued jobs, failed jobs, completed jobs, estimated delay, average job duration, average expansion factor.

Custom Metrics

In addition to the default metrics, you can easily add custom metrics for monitoring by using a custom metric collector script. This is a very simple script that captures a value and presents it in a consistent format to Bright Cluster Manager. Examples of custom metrics include values that can be read from an application or from a device such as a UPS, storage unit, firewall device, tape robot, SAN switch or KVM switch. Other interesting examples include metrics from scientific instruments connected to the cluster, such as a microscope, a telescope or a genome sequencer.

Visualization with Graphs

Bright Cluster Manager ScreenshotBright Cluster Manager Screenshot
Many features of the graphs can be customized. For example, graph line color and style, graph filling color and style, and graph transparency can all be configured.

All available metrics can be visualized using graphs. In the monitoring visualization window, multiple graphs can be shown simultaneously. A new graph is created by simply dragging a metric from the metrics tree into an empty graph area. Metrics can also be dragged into existing graph areas to allow for visual comparison between multiple metrics.

You can easily zoom in and out of graphs by dragging your mouse over an area of the graph. The monitoring system will then retrieve the required data automatically to rebuild the graph at a smaller or larger scale. Many features of the graphs can be customized. For example, graph line color and style, graph filling color and style, and graph transparency can all be configured.

All configurations of the monitoring visualization window can be saved for future use. So if you have built up an 8 x 6 matrix of 48 different graphs — each with its own customized color scheme — you can save this configuration and load it quickly later.

Visualization with the Rack View

Bright Cluster Manager ScreenshotBright Cluster Manager Screenshot
The Rack View shows the rack layout of the cluster, with optionally one or two metrics displayed per node using a color scale.

All available metrics can also be visualized in the Rack View. The Rack View shows the rack layout of the cluster, with optionally one or two metrics displayed per node using a color scale.

If the order and size of the nodes, switches and other devices in the cluster are known to Bright Cluster Manager, they will be used to build the rack layout in the Rack View. Otherwise, the nodes and switches will be shown at equal size and in alphabetical order.

For clusters with many racks, the "zoom out" feature allows you to see the metric values in many racks simultaneously as a color map.

Bright Cluster Manager ScreenshotBright Cluster Manager Screenshot
For clusters with many racks, the "zoom out" feature allows you to see the metric values in many racks simultaneously as a color map.

The Rack View is a very useful tool for visualizing what is going on in your cluster. For example, if you show CPU or system temperatures in the Rack View, you can immediately see if some parts of your cluster are running hotter than other parts. You can also use the Rack View to show two metrics simultaneously to see if they are correlated. For example, fan speeds and CPU temperatures will often show some level of correlation.

Configuration of the Monitoring System

The Bright monitoring system is fully configurable to match your needs and preferences. Some examples of configurable settings include:

  1. Which default and custom metrics to monitor. For example, you can stop certain metrics from being sampled, but you can also just stop metrics from being stored. The latter means that you are saving on storage while you are still able to visualize 'current' values. You can also still define thresholds on metrics you are not storing.
  2. How often to sample each metric. For example, you may want to sample CPU temperature values every minute, but fan speed values only every 10 minutes.
  3. How long to keep metrics data. For example, you may not be interested in disk performance metrics older than 3 months, but you may be interested in cluster load values over the lifetime of the cluster.
  4. How to consolidate each metric over time. For example, you may wish to keep used swap space values of nodes in the node category "large memory nodes" over the lifetime of the system, whereby values of the last 30 days should not be consolidated, but values older than 30 days may be averaged per hour, and values older than 90 days may be averaged per day.

Monitoring Architecture

All monitoring data is either sampled locally by the cluster management daemon (CMDaemon) on each regular and head node, or it is sampled directly from the BMC through the IPMI or iLO interface. In both cases, sampling is optimized for minimal resource consumption. For example, the CMDaemon samples all metrics in one process, without forking additional processes, whereas sampling through the IPMI or iLO interface happens out-of-band.

The CMDaemon on the head node periodically collects the data from the CMDaemons on the other nodes and stores it as raw data in the raw database hosted on the head node. The data is subsequently consolidated into the consolidated database, which is also hosted on the head node.

When the cluster management GUI generates a graph or a Rack View, it requests the required data from the CMDaemon on the head node, which reads it from the consolidated database.

The diagram below illustrates the monitoring architecture:

Diagram of the Cluster Monitoring Architecture in Bright Cluster Manager
 
 
Quote
Why Bright is Better
Next Steps

 

Home

Home page

Product Features

Overview
Editions
Based on Linux
Intel Cluster Ready
Installation
Cluster Management GUI
Node Provisioning
Cluster Monitoring
Cloud Bursting
GPU Management
ScaleMP Management
Workload Management
Cluster Health Management
Advanced Features
User Portal
NVIDIA CUDA & OpenCL

Customers

Customer Testimonials
Analist Testimonials
Partner Testimonials

Where to Buy

Where to Buy
Resellers Asia
Resellers Canada
Resellers Europe
Resellers Middle East
Resellers Russia
Resellers South America
Resellers USA

Company

About
News
Events
Employment
Where to buy

Contact us

+1 408 300 9448
info@brightcomputing.com
Twitter: @BrightComputing

Connect



 
 
Site Map | Legal | © 2009–2012 Bright Computing, Inc. All rights reserved.