Deep Learning in Bright Cluster Manager Version 7.3


data_intensive.jpgEnterprises have been busy for years collecting large amounts of data and analyzing it to obtain a competitive advantage by using machine learning – developing algorithms that can learn from and make predictions on data. Now some are looking to go even deeper – using machine learning techniques called deep learning to create predictive applications for fraud detection, demand forecasting, click prediction, and other data-intensive analyses.

Where traditional data analysis is helpful in extracting simple information, deep learning can delve into the more esoteric properties hidden in the data. It attempts to address the problems the way a human expert would. One blog I recently read listed eight visual examples of deep learning; a few interesting ones include colorization of black and white images; adding sounds to silent movies; and automatic machine translation.

The computer vision, speech recognition, natural language processing, and audio recognition applications being developed using deep learning techniques need large amounts of computational power to process large amounts of data.

As new tools designed specifically for deep learning become available, many developers are using them to build their applications on advanced clusters that take advantage of accelerators like NVIDIA’s Tesla M40.

To get the insights enterprises are looking for with deep learning, the underlying IT infrastructure needs to be deployed and managed as enterprise-grade, not as a lab experiment. Our customers have told us about the enormous challenges they face building and managing advanced clusters for deep learning. That’s why Bright stepped in with Bright for Deep Learning, a Bright Cluster Manager Version 7.3 solution that make it easy to set up a deep learning IT infrastructure, and to develop and run deep learning applications.

Some basics – what is deep learning and how is it different from machine learning?

Machine learning involves developing algorithms that operate by building a model from example inputs to make data-driven predictions or decisions. There are three types of machine learning: supervised machine learning, unsupervised machine learning, and reinforcement learning.

With supervised machine learning, the program is “trained” on a predefined set of criteria. For example, one may feed the program information on prior home sales prices based on neighborhood, number of bedrooms, and total square footage, and then ask it to predict what the sales price would be for new sales. (Any good real estate agent knows how to price houses based on area, neighborhood, and similar factors, but programming a computer to do that using standard techniques is cumbersome.) Another example: show the computer predefined sets of data (collections of images of cats and dogs), and it will learn to properly identify other images.

Unsupervised machine learning means the program is given a large amount of data and must find nonlinear relationships within the data provided. An example of this might be looking at real estate data and determining which factors lead to higher prices in certain parts of the city.

Reinforcement learning is when a computer program interacts with a dynamic environment in which it must perform a certain task. Examples include interacting dynamically with social media to collect data on the public sentiment on an issue. The computer can get information from data and predict future contributions in real time.

Machine learning works only if the problem is solvable with the available data. (You cannot estimate the price of an air fare based on whether the customer has a dog.) If the data wouldn’t help a human expert solve the problem, it won’t help  the machine either.

ThiDeep_LEarnig_Blog.jpgs is where deep learning comes in. Deep learning is a dynamic system that emulates the human brain, especially how neurons interact in the brain, and how different layers of the brain work together. Its use has spiked, in part due to the availability of large datasets for training the computer and fast hardware like GPUs.

Deep learning allows the kind of segmentation (partitioning the digital image into segments that make it easier to analyze) that enables extraction of high level information that can be encoded for computer use. Getting back to earlier animal analogies, instead of just classifying pictures of cats, deep learning will let users distinguish picture of cats lying on the ground from those jumping.

Bright for Deep Learning makes it faster and easier for organizations to use deep learning techniques

Several customers had told us that new deep learning software modules can take them days to download and install if using the open source repositories. Bright’s vision was to develop a comprehensive solution that would provide everything needed to spin up an effective deep learning environment, and manage it effectively.

And what does that include?

Choice of machine learning frameworks – Bright Cluster Manager Version 7.3 provides a choice of machine learning frameworks, including Caffe, Torch, TensorFlow, and Theano, to simplify deep learning projects.

Choice of machine learning libraries – Bright includes a selection of the most popular machine learning libraries to help access datasets. These include MLPython, NVIDIA CUDA Deep Neural Network library (cuDNN), Deep Learning GPU Training System (DIGITS), NCCL, TensorRT and CaffeOnSpark, the open sourced solution for distributed deep learning on big data clusters. More will be added in the future, for example, CNTK, Bidmach and others.

Supporting infrastructure – With Bright, users don’t have to worry about finding, configuring, and deploying all of the dependent pieces needed to run deep learning libraries and frameworks. Bright Cluster Manager Version 7.3 includes Python modules that support machine learning, plus the NVIDIA hardware drivers, CUDA (parallel computing platform API) drivers, CUB (CUDA building blocks), and NCCL (library of standard collective communication routines).

It all adds up to a really simple deployment of deep learning infrastructure, which lets users deploy deep learning environments in minutes. Bright also provides and installs environment modules that make it easy to dynamically modify the user environment.

Deep learning applications can also be scaled beyond a single machine, spreading the processing across an entire cluster for better performance.

If users need more capacity, the deep learning features let them extend GPU-enabled instances into the cloud using Bright’s cloud bursting capability. Bright also makes it easy to containerize deep learning applications, or run them in a private OpenStack cloud. Users can even take advantage of the performance provided by modern clusters with RDMA-enabled interconnects by running a deep learning application using RDMA-Spark.  

At our Bright machine learning laboratories in San Jose, we’ve been working on combining state of the art HPC clusters with deep learning. For example, we are currently working with a major university that is using deep learning techniques on fraud detection for credit companies. We have also worked with a major imaging and electronics company on a project that is using the Caffe deep learning network to classify images, as well as for segmentation and separation of objects for handwriting recognition.

We believe we are only in the infancy stage of deep learning and we are excited to see how the new Bright for Deep Learning will make it faster and easier for organizations to use deep learning to gain actionable insights from rich, complex data.

Download our eBook on building a deep learning environment

EU_flag_7751.pngThis project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 711315 Bright Beyond HPC


Building a Deep Learning Environment for Your Organization