By Robert Stober | November 29, 2012 | MPI, Intel MPI Benchmark, IMB
This article shows how to run the Intel MPI Benchmark (IMB) on a Bright cluster. In this example we will run the benchmark over low latency 10GbE interfaces.
Let's get started.
First...
$ cd /cm/shared/apps/imb/current/
Then run the setup.sh script. This will create benchmark directory in your home directory.
$ ./setup.sh
creating ~/BenchMarks/imb/3.2.3
creating nodesfile with node001 and node002
creating simple runscript
please see ~/BenchMarks/imb/3.2.3
Now cd to the benchmark directory
$ cd ~/BenchMarks/imb/3.2.3
Load the opempi/gcc module. The second line adds causes the module to automatically be loaded when you log in. So you only need to do this once.
$ module load openmpi/gcc
$ module initadd openmpi/gcc
Build the benchmark binaries.
$ make -f make_mpi2
Create a MPI hosts file.
$ cat ~/imb.hosts
node001
node002
Run the IMB Ping Pong benchmark. The "--mca btl_openib_cpc_include rdmacm" argument was added so that the benchmark runs properly over the 10GbE fabric.
$ mpirun --mca btl_openib_cpc_include rdmacm -np 2 -machinefile ~/imb.hosts \
IMB-MPI1 PingPong
benchmarks to run PingPong
#---------------------------------------------------
# Intel (R) MPI Benchmark Suite V3.2.3, MPI-1 part
#---------------------------------------------------
# Date : Fri Jul 20 18:11:55 2012
# Machine : x86_64
# System : Linux
# Release : 2.6.32-220.el6.x86_64
# Version : #1 SMP Wed Nov 9 08:03:13 EST 2011
# MPI Version : 2.1
# MPI Thread Environment:
# New default behavior from Version 3.2 on:
# the number of iterations per message size is cut down
# dynamically when a certain run time (per message size sample)
# is expected to be exceeded. Time limit is defined by variable
# "SECS_PER_SAMPLE" (=> IMB_settings.h)
# or through the flag => -time
# Calling sequence was:
# IMB-MPI1 PingPong
# Minimum message length in bytes: 0
# Maximum message length in bytes: 4194304
#
# MPI_Datatype : MPI_BYTE
# MPI_Datatype for reductions : MPI_FLOAT
# MPI_Op : MPI_SUM
#
#
# List of Benchmarks to run:
# PingPong
#---------------------------------------------------
# Benchmarking PingPong
# #processes = 2
#---------------------------------------------------
#bytes #repetitions t[usec] Mbytes/sec
0 1000 3.40 0.00
1 1000 3.45 0.28
2 1000 3.42 0.56
4 1000 3.43 1.11
8 1000 3.47 2.20
16 1000 3.47 4.40
32 1000 3.47 8.79
64 1000 3.64 16.77
128 1000 4.78 25.51
256 1000 5.00 48.84
512 1000 5.37 90.85
1024 1000 6.41 152.34
2048 1000 7.35 265.62
4096 1000 9.56 408.48
8192 1000 14.26 547.98
16384 1000 25.22 619.43
32768 1000 39.59 789.26
65536 640 68.27 915.42
131072 320 125.73 994.22
262144 160 240.46 1039.69
524288 80 469.95 1063.94
1048576 40 928.78 1076.68
2097152 20 1846.30 1083.25
4194304 10 3682.65 1086.17
# All processes entering MPI_Finalize
We're done.