Bright Computing Logo

Advanced cluster management made easy

 
 
PARTNER LOGIN
 
Bright Cluster Manager
Home > Products > Automated Cluster ManagementBookmark and Share
Overview Editions Architecture Architecture Design Based on Linux Intel Cluster Ready Cluster Management Daemon Cluster Management GUI Cluster Management Shell Supported Hardware Cluster Management Node Provisioning Node Identification Staying Up-to-Date Cluster Monitoring Automated Management GPU Management User Management Parallel Shell Workload Management Bright Cluster Health Cluster Security Development Environment NVIDIA CUDA & OpenCL Compilers Debuggers & Profilers MPI Libraries Mathematical Libraries Environment Modules Advanced Features Documentation

Automated Cluster Management

Automated Cluster Management is a very powerful feature for cluster administrators. It allows you to set a threshold for any metric and define any action to be taken when that threshold is exceeded. Any of the built-in or custom metrics supported by Bright Cluster Manager™ can be used and any cluster management shell or Linux command or script can be used as an action.

Examples of Actions

Some examples of "actions" that can be configured with Bright Cluster Manager™ include:

Examples of Rules

Bright Cluster Manager ScreenshotBright Cluster Manager Screenshot
A configuration wizard is available to guide you through the steps of defining a rule.

Some examples of "rules" that can be configured with Bright Cluster Manager include:

  • If the amount of free space in /home goes below 9.3 Gigabyte, send an email to administrator@localhost.
  • If the number of running jobs exceeds 120, log an event in the GUI event viewer.
  • If the temperature in any of the nodes in node category "Large SMP Nodes" exceeds 60 degrees Celsius, send an SMS text message to mobile phone number +1 123 123 1234 and shutdown the offending node.

This tool is very powerful and can be a real time-saver. For example, you can monitor the health of your cluster and take preemptive action when hardware shows signs of imminent failure, or you can monitor usage of your cluster and take preemptive action before the cluster runs out of resources.

A configuration wizard is available to guide you through the steps of defining a rule, which includes selecting a metrics, defining a threshold and defining an action.

State Flapping

The Automated Cluster Management system is sophisticated and highly configurable. One example is its ability to deal with so-called "state flapping", which is a situation where a threshold is exceeded repeatedly within a short time frame. This can, for example, happen when a CPU temperature fluctuates around a configured threshold, potentially causing the system to send out many emails in a short time frame. The system is able to detect such a situation and can be configured exactly how to deal with it.

Quote
Contact Us
 
© 2010 Bright Computing, Inc. All rights reserved. Site Map | Legal |