Data Center Technology: HPC, OpenStack, or Hadoop?


By Lionel Gibbons | October 30, 2017 | OpenStack, HPC, Hadoop



GettyImages-635734102.jpgThe latest IDC big data report says the world will go from producing a current annual average of 30 zettabytes (a zettabyte is a trillion gigabytes) of data to 163 zettabytes by 2025. That abstraction likely mirrors your concern with your own data center challenges of uncontrolled data growth and manipulation needs when HPC, big data, and OpenStack management, monitoring, and provisioning are segregated.

Big data, HPC, and OpenStack are at the heart of transformation for the modern data center where a growing list of use cases ranges across sectors. But as the market for HPC systems expands, new users demand data center technology that makes these systems easier to deploy and use.

As big data clusters and public and private cloud use grow, the silos in which they each exist become increasingly unsustainable, leading you to a lot of extra management, maintenance, and cost overhead. This segregation is untenable due to the manual effort and complexity of provisioning, managing, and monitoring these big data clusters. Coupled with the challenges of time constraints and constantly changing user needs, the segregation of HPC, big data, and OpenStack is holding back the data center.

Why Segregated HPC, Big Data, and OpenStack Limits the Modern Data Center

The number of business and industry sectors using Hadoop, Spark, and HPC, along with private cloud (OpenStack) and public cloud (AWS and Azure), is constantly growing. Unfortunately, the segregation of these elements, along with cluster needs that are constantly shifting, makes it nearly impossible to achieve integrated workflows.

There are many hardware, software, and administration issues to deal with around mixed-use environment performance for data center cluster environments operating outside of pure-play HPC. One challenge in streamlining and integrating big data and HPC is that Hadoop clusters typically have storage on every compute node, while HPC clusters may have diskless nodes with no storage whatsoever. Additionally, differing file systems make it difficult to set up a common file system between the two.

As you consolidate data centers, you’ll want to move away from dedicated HPC and big data clusters while moving more workloads to the public and private cloud. The challenge here is the difficulty in provisioning, managing, and monitoring OpenStack on bare metal while being able to integrate with VMs.

Provisioning may be at the heart of the challenges surrounding cluster management in data centers heavily invested in big data and HPC clusters that are also expanding their use of public and private clouds. This expansion brings a number of benefits, including cloud bursting, which enables high-performance cluster extensions (on demand) or temporary storage offload. There are also challenges such as networking bottlenecks, performance tuning such as simplified image management and cost containment for use of the public cloud in terms of repatriation of data.

What we all need is a cohesive strategy that can handle all of the various data center technologies in a unified way. That means taking into account front-end requirements in ways that enable cohesive workflows. This comes down to provisioning, management, and monitoring integration through automation with integrated cluster management.

The Benefits of Integrated Cluster Management

More and more, organizations need data center technologies that can streamline and address big data, as well as HPC clusters, along with OpenStack, and automate provisioning, management, and monitoring via a single pane of glass approach. Integrated cluster management solutions are designed to bring that level of integration through process automation.

One benefit is the ability to provision a Hadoop environment and HPC resource environment from bare metal all from the same pool of resources with a single point of management. That means change management is applied to both HPC and Hadoop environments so that changes applied to the image are propagated to nodes in real time without rebooting.

The cloud, whether private or public, is expected to provide elasticity, service-based configurability, multi-tenancy, and on-demand access across data center technology. That requires a private cloud orchestration framework that provisions resources based on customer requests, including spinning up clusters on demand. These include open-source cloud management platforms such as OpenStack.

While OpenStack can spin up, move, and shut down a VM, it lacks OS control. But having a complete cluster resource management solution enables control of the environment inside the VM. In addition, your data center technology benefits from automated node provisioning and allocation that sees both physical and virtual servers across HPC big data clusters in the same way.

The ability to spin up and provision VMs, as well as clusters, in both the public cloud and private cloud, along with integration of bare metal servers as part of cluster nodes, is a major benefit. It gets better by providing integrated management and monitoring across all cluster environments, including containers. When you add in cloud bursting, as well as simplified configuration and image management, your data center takes a major leap forward.

Having the ability to spin up virtual nodes that look the same and are interchangeable with physical nodes ensures an infinite elasticity among HPC and big data clusters, as well as across private and public cloud environments. The bottom line is that you no longer have to make compromises with integrated cluster management because you can have full integration and a unified single pane of glass GUI for managing, monitoring, and reconfiguring your data center ecosystem any way you want.

Achieving a Dynamic Data Center