By Ian Lumb | April 08, 2014 | GPU, Cloud, Cloud Computing, Cluster Management, Linux Cluster Management, Amazon EC2, Bright Cluster Manager, CUDA, GPGPU management, Linux clusters with accelerators, Linux cluster with GPUs
Have you tried cloud-based GPUs for HPC? If not, here are the top four reasons you might want to.
The cloud is all about making resources available when you need them. Perhaps you need GPUs, but don’t have access to them - because the on-the-ground GPUs are being heavily used in production, or there aren’t any on-the-ground GPUs(!). The cloud can help. Perhaps you’ve been working with legacy GPUs and you want to benchmark your applications on more-recent GPU technology - the latest in GPU hardware and the latest version of the CUDA toolkit. The cloud can help. Perhaps it’s time to scale your application. Using the latest GPU technology you can scale up your application in the cloud - e.g., to take advantage of the NVIDIA K40’s 2880 CUDA cores or GPU Boost. And, of course, you can use the cloud to scale out your application by making use of multiple-to-many GPUs. Cloud-based GPUs might enable you to do something you can’t do right now.
The promise of GPUs for HPC is increasingly compelling. Whether your metric is FLOPS/$ or FLOPS/W, GPUs deliver. The Top 500 and Green 500 make this very clear: GPUs maximize price performance while minimizing energy consumption. GPU technology has delivered stunning application speedups for leadership-class HPC in government, education and the commercial enterprise. The fact that GPUs are cloud based changes none of this: Cloud-based GPUs still allow you to realize the promise of GPUs for HPC. Translation: You get results - better, faster and cheaper results.
The cloud is transparent by definition - isn’t it? To a degree, certainly. But for HPC applications, not entirely. To be truly enabling, end users and their applications shouldn’t even be aware that the GPUs they’re using are based in the cloud. That’s the degree of transparency needed. Therefore, making cloud-based GPUs appear as if they’re local is the first challenge. In other words, cloud-based GPUs need to be seamlessly incorporated into the existing IT infrastructure in an unobtrusive way - e.g., on-the-ground clusters need to utilize the cloud through logical extension.
This part is table stakes. Why? The real technical challenge with HPC and the cloud is dealing with latency - a challenge not specific to GPUs, of course. By intelligently staging data for HPC , the latency of transferring data between the ground and the cloud can be addressed in practice. Persistent storage (e.g., AWS S3) and/or archival/backup (e.g., AWS Glacier) in the cloud can further minimize the impact of such transfers. Data aside, the even tougher challenge with latency is during computation. Some HPC applications are latency intolerant - e.g., those making use of message passing between processing nodes. This means that MPI applications and the cloud are often a non-starter - well, not without significant effort, effort to manage latency.
Of course, there are classes of applications that do not need to intersperse communication with computation, and for these applications the cloud is a ready-to-use platform - a platform where cloud-based GPUs could easily play a role. Latency notwithstanding, cloud-based GPUs can be transparently incorporated into local IT infrastructures.
There’s no denying it: It takes serious effort to write code for GPUs, or to port existing code to make use of GPUs. From native programming languages like CUDA and OpenCL to directives-based approaches like OpenACC, there’s a need to support developers with an ecosystem of tools. From compilers, debuggers and profilers to enabling libraries and APIs, there is an entire development platform that ultimately relies upon a suitably modified Linux kernel that enables GPUs for use as compute accelerators. Fortunately, the entire stack can be provisioned, monitored and managed using software that combines all the pieces into a usable whole for GPU-based HPC in the cloud. In other words, developers and end users continue to use the applications they are familiar with despite the fact that their GPUs are now based in the cloud.
The cloud can be an enabler in making use of GPUs for HPC. These top four reasons indicate why you might want to take cloud-based GPUs for a test drive if you haven’t already.