CUDA 6.5: Something for Nothing



Who says you can’t get something for nothing? With CUDA 6.5, you can! The something is improved performance. The nothing is no code change required.

NVIDIA made improved performance a focal point of the recent CUDA 6.5 release. For example, if you make use of CUDA-enabled FFTs and/or sparse-matrix routines, all you’ll need to do is recompile your application with this latest release of the toolkit. You’ll see performance improvements without having to change a single character in your source code. Even the size of object files can be optimized through use of the nvprune utility.

NVIDIA has also made developer productivity a focal point in CUDA 6.5. Improved debugging capabilities for CUDA FORTRAN applications, as well as application profiling that includes a replay capability, are new to this release of the toolkit.

Of course, developer productivity is not only about development environments with tools like editors, debuggers and profilers. In the case of CUDA, NVIDIA has also been expending considerable effort to simplify the semantics of development in the CUDA context. CUDA 6 saw the introduction of a unified model for memory - a model that allows developers to program for the semantics of memory shared between GPUs and CPUs. In CUDA 6.5, NVIDIA has also introduced a CUDA Occupancy Calculator API to assist developers in the determination of optimal CUDA kernel launch configurations.

These and other performance and productivity highlights of CUDA 6.5 are discussed in much greater detail over on NVIDIA’s Parallel Forall blog.

Bright Cluster Manager significantly enhances the something-for-nothing proposition of CUDA 6.5. How so? Well, if you’re already making use of Bright, support for CUDA 6.5 (something) will automatically flow your way (for nothing). Days after the production release of CUDA 6.5, yum-mediated updates from Bright Computing now make this latest version of the toolkit available to our customers. We ensure that Bright:

  • Successfully builds and installs the updated kernel driver from NVIDIA, and ensures that this driver performs as expected in practice;
  • Successfully builds and installs the updated kernel module for unified memory from NVIDIA, and ensures that this module performs as expected in practice;
  • Successfully executes administrative operations using utilities provided by NVIDIA;
  • Integrates appropriately with the updated kernel driver so that GPUs can be appropriately managed, health checked and monitored via the Bright CLI (CMSH) and GUI (CMGUI). Since monitoring implies a need for metric collection, Bright-managed GPUs can also be reported upon; and finally
  • Builds and executes CUDA and OpenCL test code.

As I continue to evangelize, whenever I have the opportunity, you don’t need to choose between different versions of the CUDA toolkit. With Bright Cluster Manager, you can create highly customized development contexts using Environment Modules. Switching between different toolkits is as easy as typing a simple one liner into your favorite shell.

As with previous releases of the toolkit, CUDA 6.5 significantly advances GPU-based HPC. We are delighted to closely track this release of the toolkit by making it seamlessly available to Bright Computing customers.

High Performance Computing eBook