How to Spin Hadoop up to the Cloud


By Lee Carter | March 07, 2015 | Cloud, Cloud Computing, Cloud Manager, Hadoop



As with many industries, the pharmaceutical sector is trending towards cloud bursting, so if you need more digital space, more compute power, or more storage, don’t pay for an onsite solution—look to the cloud! And cloud bursting plays to the increasingly popular subscription-based “pay per use” business model with which we are becoming increasingly familiar. 


It’s been well documented that Bright can help you easily create new clusters in the cloud, or add cloud-based resources to your existing clusters on the fly.

How does Bright cloud burst?  

Well, Bright offers two cloud bursting scenarios:

  • Scenario 1: Cluster-on-Demand is ideal if you do not have a cluster onsite, for example, because you lack the space, power, cooling, or budget. However, if you need compute power, Bright instantly can create a complete cluster in the public cloud, for any duration of time.
  • Scenario 2: Cluster Extension suits companies that have a cluster onsite but the cluster does not have enough compute power. In this case, Bright instantly adds compute power from a public cloud to the onsite cluster, for any duration of time.

Our cloud bursting solutions are proving to be very popular in the OpenStack and HPC space.

What about Hadoop?

Something that hasn’t yet been so well documented, as it’s fairly new, is that a growing number of my pharmaceutical customers are asking for the ability to spin up Hadoop instances in the cloud—to access and analyse big data—for a limited period of time.  

Spinning Hadoop in the cloud makes most sense if data already is in the cloud, or if the use case allows for data to be in the cloud. In situations like this, Bright spins up a Hadoop cluster in the cloud, giving you virtual access to extra compute power for the duration of the analysis project.

Think of it, if you will, as Hadoop-as-a-Service.

When would you use this?

Pharmaceutical companies are looking to Bright to help them manipulate large quantities of data, for ad hoc big data analysis projects. For example, once a month, you might want to analyse large data sets for product development. Or, you may wish to carry out quarterly analysis on patient outcomes.

Once the analysis project is complete, the cloud instance of Hadoop is quickly powered off, and the number of hours or days that the additional compute power was required for is paid for “as a service”. You are not required to purchase any additional hardware, and you don’t have to worry about storing the data yourself (for example purchasing and maintaining the storage equipment, keeping back-ups, etc.)

The beauty of this solution from Bright is that we are the enabling technology; there is no need to enlist the help of a third party, or pay a cloud service provider to build this service to you. And there is no vendor lock-in. Bright gives you the tools you need to easily spin Hadoop up yourself, and rent the additional cloud resources directly. This gives you the freedom to choose which Hadoop distribution you wish to run. If you opt for a Hadoop service from your cloud provider, the Hadoop distribution and version will be chosen for you.

And it’s all managed from the same Bright user interface that you use to manage your day-to-day HPC, OpenStack, and Hadoop infrastructure, giving you a single pane-of-glass view of your entire operation. 

In addition, you now are empowered to use the cloud as a data backup and storage, in case future analysis is required. For example, increasingly raw data logs go directly to Amazon to be stored, and stay there for all time. If data analysis only needs to be carried out on an ad hoc basis, there is no need to pay for a Hadoop cluster 365 days a year. With Bright, you can deploy a cloud instance of Hadoop, draw the data from the cloud store, carry out the analysis, power off the Hadoop cluster, and repeat as required. 

Who can benefit from this?

Hadoop-as-a-Service will resonate with:

  • Organisations that have a need to carry out big data analysis on an ad hoc basis.
  • Organisations that don’t have the resources to invest in a huge Hadoop cluster and the associated hardware.

Watch out for pitfalls!

You may have heard that other vendors are starting to bundle solutions for this trend. Here are some potential pit falls to avoid:

  1. Make sure that the solution you choose isn’t based on inflexible, pre-defined templates that are limited to a specific number of nodes. You should be able to choose how many nodes you need.
  2. Make sure the solution you choose is fully integrated with the cloud. Don’t waste time on unnecessary set up and integration processes.
  3. Make sure the solution is out-of-the-box and fully automated. Don’t incur unnecessary rental costs due to lengthy power-on and power-off processes.
  4. Make sure the solution works with the Hadoop distribution and version that you are interested in using.

How does Bright bring value?

  • You’re in control! You choose the vendor technology, how many nodes you require, and for how long you require them.
  • You can use the same Bright user interface to manage cloud infrastructure and Hadoop.
  • We make installation and deployment of Hadoop in the cloud easier and faster than most other approaches.
  • We quickly deploy, run, and remove Hadoop clusters of any size and for any duration. You only pay for actual usage.  
  • We support and integrate with several different Hadoop distributions and versions.
  • We offer a completely flexible and scalable solution.
  • Finally, as Bright + Hadoop install and deploy faster than other solutions, and have automatic pre-configuration settings that are optimised for the cloud, cloud resources are used less, so the final cost will be lower.



Remember to subscribe to our blog to stay on top of the latest tips, innovations, and best practices for cluster management.