How to download and run a CUDA 4.1 demo job using the Bright CMSH

    

With Bright Cluster Manager, there are only 5 fast and easy steps to download and run a CUDA 4.1 demo job. In this example I will use the Bright CMSH.

Let's get started.

NVIDIA CUDA

 

 

 

 

 

 


Prepare to run the demo job

The demo job looks for the shared library libcudart.so.3, which was included in CUDA 3.x but is not included in CUDA 4.x. We'll work around this by creating a symbolic link.

[root@atom-head1 tmp]# cd /cm/shared/apps/cuda41/toolkit/current/lib64/
[root@atom-head1 lib64]# ln -s libcudart.so.4.1.28 libcudart.so.3
[root@atom-head1 lib64]# ls -l libcudart.so*
lrwxrwxrwx 1 root root     14 Sep  7 16:15 libcudart.so -> libcudart.so.4
lrwxrwxrwx 1 root root     19 Sep  7 17:32 libcudart.so.3 -> libcudart.so.4.1.28
lrwxrwxrwx 1 root root     19 Sep  7 16:15 libcudart.so.4 -> libcudart.so.4.1.28
-rwxr-xr-x 1 root root 353968 Jun 29 20:03 libcudart.so.4.1.28

Download the demo job

[rstober@atom-head1 ~]$ wget https://dl.dropbox.com/u/2999184/OEM_Toolkit_Linux_64.tar.gz
--2012-09-08 11:29:28--  https://dl.dropbox.com/u/2999184/OEM_Toolkit_Linux_64.tar.gz
Resolving dl.dropbox.com... 23.23.133.20, 107.22.246.178, 23.21.146.171, ...
Connecting to dl.dropbox.com|23.23.133.20|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 14169096 (14M) [application/x-tar]
Saving to: âOEM_Toolkit_Linux_64.tar.gzâ

100%[============================================================>] 14,169,096  2.25M/s   in 6.1s

2012-09-08 11:29:36 (2.22 MB/s) - âOEM_Toolkit_Linux_64.tar.gzâ

The demo job is successfully downloaded

[rstober@atom-head1 ~]$ ls -l
total 13868
drwxrwxr-x 2 rstober rstober     4096 Sep  7 17:35 bin
drwxrwxr-x 3 rstober rstober     4096 Sep  7 16:25 dev
-rw-rw-r-- 1 rstober rstober 14169096 Sep  8 11:29 OEM_Toolkit_Linux_64.tar.gz

Install the demo job

[rstober@atom-head1 ~]$ tar xvzf OEM_Toolkit_Linux_64.tar.gz
OEM_Toolkit_Linux_64/
OEM_Toolkit_Linux_64/oem_testkit.sh
OEM_Toolkit_Linux_64/libcublas.so.3
OEM_Toolkit_Linux_64/oem_dgemm/
OEM_Toolkit_Linux_64/oem_dgemm/libcublas.so.3
OEM_Toolkit_Linux_64/oem_dgemm/oem_dgemm
OEM_Toolkit_Linux_64/oem_dgemm/libcudart.so.3
OEM_Toolkit_Linux_64/libcudart.so.3
OEM_Toolkit_Linux_64/bin/
OEM_Toolkit_Linux_64/bin/oem_bandwidthTest
OEM_Toolkit_Linux_64/bin/run_oem_binom.sh
OEM_Toolkit_Linux_64/bin/oem_query
OEM_Toolkit_Linux_64/bin/oem_binom
OEM_Toolkit_Linux_64/bin/oem_mtxmul
OEM_Toolkit_Linux_64/README.txt
OEM_Toolkit_Linux_64/oem_query.sh
OEM_Toolkit_Linux_64/oem_binom.sh
OEM_Toolkit_Linux_64/oem_dgemm.sh
OEM_Toolkit_Linux_64/oem_mtxmul.sh
OEM_Toolkit_Linux_64/oem_bandwidthTest.sh

Log into a cluster that has GPUs

[rstober@atom-head1 ~]$ ssh atom04
Last login: Sat Sep  8 11:40:50 2012 from atom-head1.cm.cluster

Run the test job.

Note that I've omitted the vast majority of the output.

[rstober@atom04 ~]$ cd OEM_Toolkit_Linux_64
[rstober@atom04 OEM_Toolkit_Linux_64]$ bin/run_oem_binom.sh

. . .
test output 0 - 998 omitted
. . .
TEST PASSED
ITERATION COUNT = 999 of 1000
Executing GPU kernel...
Options count            : 512
Time steps               : 2048
GPU RUN TIME             : 33.855999 msec
Options per second       : 15122.873796
Reading back the results...
Checking the results...
...comparing the results.
L1 norm: 7.468465E-07
Max absolute error: 1.411438E-04
TEST PASSED
Shutting down...

 

We're done.

High Performance Computing eBook