The North-German Supercomputing Alliance (HLRN) offers a high performance computing (HPC) cluster with modern Cray XC30 and XC40 nodes. The main advantage (beneath fast processors and a high amount of RAM) is the opportunity to run massive parallel operations in compination with the Message Passing Interface (MPI). R can make use of these ressources. The CRAN Task View page offers a good overview.
Here, I will show you how to compile R on the HLRN including the Rmpi package. The successful compiling is related to a number of settings and installed packages which may be different of other systems. Therefore, this tutorial may not work with other Cray XC systems!
The following settings and programs were used for compiling:
- The gnu-environment (v5.2.40)
- R v3.1.2
- Rmpi v0.6-5
- cray-mpich v7.0.5, different v6-versions do work, too.
- All is done on the $WORK partition. $HOME may be an alternative, but do not use $TEMPDIR or $PERM. The design of other Cray clusters may be different.
The HLRN does not offer a R installation (nor standard, nor module).
Therefore, you have to compile it by hand.
The main drawback is that the guys at HLRN do not offer any outgoing transmissions.
That means you have to upload everything yourself, including all packages and dependencies (which have to be resolved by hand :-( ).
I created an archive with all necessary and optional packages (R v3.1.2, Rmpi v0.6-5, snow v0.3-13 doSNOW v1.0.12, foreach v1.4.2, iterators v1.0.7).
I will try to keep this package up-to date.
After uploading the package (or the single packages) to the $WORK directory (I am using a subfolder called temp), please log in using a SSH client of your choise and extract the archive (or the R Sources).
$ cd $WORK/temp $ tar xzvf rmpi_cray_hlrn_20150102.tar.gz $ tar xzvf R-3.1.2.tar.gz
The next step will be the configuration. Before switching to the GNU environment, you may store the path of the mpich directory as it will change in the gnu environment and OpenMPI is not installed (seems like there is only some wrapper). Caution: The Variable $MPICH_DIR points to the 32Bit Version of the libraries. Do not forget to add 64 (cf. below)! Pointing to the wrong libraries will cause compile failures.
All binaries and libraries will be installed in $WORK/R and we will turn off windows support as the clusters are headless. After the configuration ran through, you can compile and install R. You may fill up your cup of coffee meanwhile, but do not make fresh one - the nodes are pretty fast ;-)
In a last step, you can create a symlink to you personal bin directory.
$ $MPICH_DIR /opt/cray/mpt/7.0.5/gni/mpich2-cray/83: is a directory $ export MPICH_CRAY=/opt/cray/mpt/7.0.5/gni/mpich2-cray64/83 $ module swap PrgEnv-cray PrgEnv-gnu $ cd R-3.1.2 $ ./configure --prefix=$WORK/R --with-x=no $ make && make install $ ln -s $WORK/R/bin/R $HOME/bin/R
Try to load R by typing “R” in the console. If you see something like below, you are ready to go to the next step :-)
blogin2:/gfs1/work/<username>/temp $ R R version 3.1.2 (2014-10-31) -- "Pumpkin Helmet" Copyright (C) 2014 The R Foundation for Statistical Computing Platform: x86_64-unknown-linux-gnu (64-bit) R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R.
The installation of Rmpi is a little more complicated as the compiler needs to know where the MPI libraries are. Furthermore, the Cray libaries seem to need Cray’s own compiler (cc). Do not forget –no-test-load. You cannot load Rmpi in the login nodes as MPI ist only loaded on the working nodes.
$ cd $WORK/temp/packages $ MAKEFLAGS="CC=cc" R CMD INSTALL Rmpi_0.6-5.tar.gz --configure-args="--with-Rmpi-include=$MPICH_CRAY/include --with-Rmpi-libpath=$MPICH_CRAY/lib --with-Rmpi-type=CRAY CC=cc" --no-test-load
This installation routine is only necessary for Rmpi. Further packages can be compiled/installed without any configuration arguments.
If Rmpi compiled successfully, you can use the following little example to test it. Please install the snow package before. Using the Batch System, you need two files. The shell script specifies the needed ressources and executes the R script. Upload both files to the $WORK directory and submit the job using the msub command.
#!/bin/bash #PBS -N mpi-test #PBS -l nodes=2:ppn=5 #PBS -l walltime=00:10:00 #PBS -q mpp1testq #PBS -l naccesspolicy=shared #PBS -m abe YOUREMAIL # please change! #PBS -j oe #PBS -o $WORK/$PBS_JOBNAME.bash_output #PBS -e $WORK/$PBS_JOBNAME.bash_error # Swap to GNU enviroenment module swap PrgEnv-cray PrgEnv-gnu # go to the right directory cd $WORK # load the snow profile (needed) R_PROFILE=$WORK/R/lib64/R/library/snow/RMPISNOWprofile; export R_PROFILE # run the code aprun -B R --no-save --no-restore CMD BATCH mpi_test.R $PBS_JOBNAME.R_output
library(snow) # is loaded automatically #library(Rmpi) cl <- getMPIcluster() clusterCall(cl, function() Sys.info()[c("nodename","machine")]) clusterCall(cl, runif, 3) mpi.quit()
The R output file should contain the names of the two nodes (5 cores per node) and compute three ranodm numbers on each core. It may look like this:
> clusterCall(cl, function() Sys.info()[c("nodename","machine")]) [] nodename machine "nid00675" "x86_64" [] nodename machine "nid00675" "x86_64" [] nodename machine "nid00675" "x86_64" [] nodename machine "nid00675" "x86_64" [] nodename machine "nid00676" "x86_64" ... > clusterCall(cl, runif, 3) []  0.05371750 0.96496237 0.04513574 []  0.6419490 0.4391379 0.7554963 []  0.6862853 0.8297459 0.6580091 []  0.5291690 0.2849300 0.8945238 []  0.05210353 0.55601896 0.52931024 ...