How to install & verify CUDA 4.1 with the Bright Cluster Manager CMSH

    

Bright Cluster Manager makes every HPC management task fast and easy. In this article, I will show you how to install and verify CUDA 4.1 on a Linux cluster running Bright. In this example, the node atom04 has two NVIDIA C2075 GPUs installed.

Let's get started.

NVIDIA CUDA

 

 

 

 


Install the shared packages on the head node.

[root@atom-head1 tmp]#  yum install cuda41-toolkit cuda41-sdk cuda41-profiler

Install the driver package in the software image.

[root@atom-head1 tmp]#  yum --installroot=/cm/images/default-image install cuda41-driver

Reboot the nodes. Upon reboot the nodes will synchronize with the software image and the CUDA driver will be compiled and started.

[root@atom-head1 tmp]# cmsh -c "device reboot -c default"
atom01: Reboot in progress ...
atom02: Reboot in progress ...
atom03: Reboot in progress ...
atom04: Reboot in progress ...

Once the nodes come up you should see the cuda devices

[root@atom-head1 tmp]# cmsh -c "device pexec -c default -j service cuda41-driver status"
[atom04]
nvidia module loaded.
2 device(s) present

[atom01..atom03]
nvidia module not loaded
No device(s) created

Time to verify.

Now CUDA is installed and the driver is loaded on atom04, let's log into atom04 and verify the installation.

[root@atom-head1 tmp]# su - rstober
[rstober@atom-head1 ~]$ ssh atom04
Last login: Fri Sep  7 16:49:55 2012 from atom-head1.cm.cluster
[rstober@atom04 ~]$ module load cuda41/toolkit
unloading gcc module
[rstober@atom04 ~]$ cd $CUDA_SDK
[rstober@atom04 4.1.28]$ ./verify_cuda41.sh
Copy cuda41 sdk files to "/tmp/cuda41" directory.

make clean

make (may take a while)
Run all tests? (y/N)? y

I've omitted the output as it's very long, but the tests completed successfully.

. . .
. . .
All cuda41 just compiled test programs can be found in the "/tmp/cuda41/C/bin/linux/release/" directory
They can be executed from the "/tmp/cuda41/C" directory.

The "/tmp/cuda41" directory may take up a lot of diskspace.
Use "rm -rf /tmp/cuda41" to remove the data.

We're done.

High Performance Computing eBook