How to run the Intel MPI Benchmark (IMB) on a Bright cluster

    

This article shows how to run the Intel MPI Benchmark (IMB) on a Bright cluster. In this example we will run the benchmark over low latency 10GbE interfaces.

Let's get started.

First...

$ cd /cm/shared/apps/imb/current/

Then run the setup.sh script. This will create benchmark directory in your home directory.

$ ./setup.sh
creating ~/BenchMarks/imb/3.2.3
creating nodesfile with node001 and node002
creating simple runscript
please see ~/BenchMarks/imb/3.2.3

Now cd to the benchmark directory

$ cd ~/BenchMarks/imb/3.2.3

Load the opempi/gcc module. The second line adds causes the module to automatically be loaded when you log in. So you only need to do this once.

$ module load openmpi/gcc
$ module initadd openmpi/gcc

Build the benchmark binaries.

$ make -f make_mpi2

Create a MPI hosts file.
$ cat ~/imb.hosts
node001
node002

Run the IMB Ping Pong benchmark. The "--mca btl_openib_cpc_include rdmacm" argument was added so that the benchmark runs properly over the 10GbE fabric.

$ mpirun --mca btl_openib_cpc_include rdmacm -np 2 -machinefile ~/imb.hosts \
    IMB-MPI1 PingPong

 benchmarks to run PingPong
#---------------------------------------------------
#    Intel (R) MPI Benchmark Suite V3.2.3, MPI-1 part
#---------------------------------------------------
# Date                  : Fri Jul 20 18:11:55 2012
# Machine               : x86_64
# System                : Linux
# Release               : 2.6.32-220.el6.x86_64
# Version               : #1 SMP Wed Nov 9 08:03:13 EST 2011
# MPI Version           : 2.1
# MPI Thread Environment:

# New default behavior from Version 3.2 on:

# the number of iterations per message size is cut down
# dynamically when a certain run time (per message size sample)
# is expected to be exceeded. Time limit is defined by variable
# "SECS_PER_SAMPLE" (=> IMB_settings.h)
# or through the flag => -time

# Calling sequence was:

# IMB-MPI1 PingPong

# Minimum message length in bytes:   0
# Maximum message length in bytes:   4194304
#
# MPI_Datatype                   :   MPI_BYTE
# MPI_Datatype for reductions    :   MPI_FLOAT
# MPI_Op                         :   MPI_SUM
#
#

# List of Benchmarks to run:

# PingPong

#---------------------------------------------------
# Benchmarking PingPong
# #processes = 2
#---------------------------------------------------
       #bytes #repetitions      t[usec]   Mbytes/sec
            0         1000         3.40         0.00
            1         1000         3.45         0.28
            2         1000         3.42         0.56
            4         1000         3.43         1.11
            8         1000         3.47         2.20
           16         1000         3.47         4.40
           32         1000         3.47         8.79
           64         1000         3.64        16.77
          128         1000         4.78        25.51
          256         1000         5.00        48.84
          512         1000         5.37        90.85
         1024         1000         6.41       152.34
         2048         1000         7.35       265.62
         4096         1000         9.56       408.48
         8192         1000        14.26       547.98
        16384         1000        25.22       619.43
        32768         1000        39.59       789.26
        65536          640        68.27       915.42
       131072          320       125.73       994.22
       262144          160       240.46      1039.69
       524288           80       469.95      1063.94
      1048576           40       928.78      1076.68
      2097152           20      1846.30      1083.25
      4194304           10      3682.65      1086.17

# All processes entering MPI_Finalize

We're done.

High Performance Computing eBook