How to get the most out of Hadoop as it continues to evolve


By Lionel Gibbons | August 27, 2015 | Big Data, Hadoop




Hadoop is a name that technologists have been hearing a lot in the past few years, and with good reason. It's one of the rising stars of business technology and will probably impact just about every enterprise in some way.

If you're unfamiliar with Hadoop, it's an open-source software framework that is used to process large data sets, often referred to as "big data" by information systems professionals.

Hadoop is the unifying framework for big data applications

According to computer scientist Wolfgang Hoschek, who has over 15 years of experience with large-scale distributed systems, data intensive computing, and real time analytics, Hadoop is the unifying framework for big data applications. There aren't really any competitors out there. That said, he explained in an interview with HPC Wire, there are still missing pieces to Hadoop's functionality.

With software rallying around it, progress on that functionality is happening quickly, Hoschek says. The Hadoop ecosystem is evolving quickly due to demand, innovation, and lessons learned, and it all happens through open source collaboration. Turnkey Hadoop distributions and enterprise data hubs are being offered from a variety of companies, with integration to legacy systems and all components of the Hadoop ecosystem. Vendor products now take care of the installation, configuration, monitoring, troubleshooting, upgrades, tuning, maintenance, and other aspects that previously made deploying a big data solution a headache-inducing process.

Bright Computing is one such vendor, providing quality cluster management solutions for Hadoop. Our solution for Hadoop allows users to deploy a complete Hadoop cluster over bare metal, and its single-pane-of-glass management goes a long way towards eliminating these deployment headaches.

With our approach, getting started is pretty simple. You answer a few questions about your cluster, and our software will have it properly installed and configured in no time. You can deploy multiple distributions and operate multiple Hadoop instances simultaneously. Bright Cluster Manager monitors the entire cluster --  including the head node, data nodes, hardware and operating system --  and alerts you if there’s a problem. Peace of mind beats headaches anyway.

Hadoop continues to evolve

Hadoop used to mean an open-source software framework for distributed storage and processing of very large data sets.  And while that’s still true, there’s more to it than that now, as Hadoop has been extended in a number of ways to keep up with the needs of big data users.

At Bright, we’ve extended Hadoop by integrating the deployment and management of the servers and operating system that underpin a functioning Hadoop cluster with the management of the cluster itself. Unifying the configuration, deployment, monitoring, and management of entire stack helps to take the pain away from deploying Hadoop.

According to Charles Zedlewski, Cloudera's Vice President of Products, the fact that Hadoop has evolved is a good thing. He believes that Hadoop is a better big data platform because "the substitution of one component in the stack does not obviate the overall platform."

What lies ahead for Hadoop?

Hadoop is an attractive investment to enterprises because of one simple principle: You can never have too much information. That's also why Forrester Research says that, going forward, adoption of Hadoop is not going to be an option for enterprises. Here are a few things you can expect to see for Hadoop in the future:

SQL integration - Technologists already use Structured Query Language (SQL) to perform create, read, update, and delete (CRUD) operations on enterprise data. However, those operations are almost always performed against a relational database, and not against a Hadoop implementation.

Some companies currently offer SQL integration with Hadoop. You can expect to see that integration continue. It's important that Hadoop responds to SQL queries so that IT professionals who are already familiar with SQL will face a gentler slope on the learning curve as the move to the new technology.

The skills shortage evaporates - Many high level managers have evaluated Hadoop and found that the technology is promising. However, they have been reluctant to adopt it in their own organizations because there aren’t that many Hadoop professionals in the workforce. Expect to see the supply of people familiar with the big data technology continue to grow.

Hadoop will reside in the cloud - Hadoop isn’t just a data storage system, it's also a data processing system. Both its storage component and its processing component can reside in the cloud. This will become increasingly important as organizations move towards adopting a private cloud solution to empower their employees and make data resources more widely available.

Take the next step

If you’ve been thinking about adopting Hadoop in your organization, there’s never been a better time to do it. The software has matured, the ecosystem is broad, and the pool of Hadoop-savvy talent is bigger than ever. If you'd like to learn more about Hadoop, feel free to contact us. We're here to help.