In order to run our code on the cluster, you have to get on the cluster, configure R, get the code, and work with the scheduler. Here are the steps.
ssh <submit-node>.uw.edu
qlogin -l h_rt=12:00:00,fthread=2,m_mem_free=4G -q i.q -P proj_mmc -now no
If you have one qlogin session and want a second one, look up the hostname bu running the command hostname
in the qlogin, do another ssh to the submit node, and ssh with
ssh <worker-node>.uw.edu
I’m telly you this because it’s hard to figure out the full hostname has five parts.
mkdir -p dev
cd dev
git clone git@github.com:dd-harp/analytics-pipeline.git
git clone git@github.com:dd-harp/pr2ar.git
Check that R is cofigured. You configure R on the cluster by installing an Rprofile, like this one.
$ cat ~.Rprofile
.libPaths(c("~/share/R3.6.2", .libPaths()))
options(repos = structure(c(CRAN = "https://repo.miserver.it.umich.edu/cran")))
That tells R where to put its libraries when it runs.
Copy a script to run R. IHME provides a command to run R, but it doesn’t read the Rprofile. We hacked it to load the .Rprofile so that we can a) install our own external packages and b) use our code as packages. The three parts are
~adolgert/bin/rsing
- command-line R~adolgert/bin/execRscript.sh
- This runs a script as execRscript.sh <scriptname>
.~adolgert/bin/rsingbash
- A Bash command line within the R singularity image. This helps you figuure out what your script sees when it runs. For instance, git
isn’t installed on the singularity image, so you can’t use git
.So copy those if you need them.
devtools::install("~/dev/pr2ar")
If you’re re-installing pr2ar a million times because you’re editing it, try an alternative installation call.
devtools::install("~/dev/pr2ar", dependencies = FALSE, upgrade = FALSE)
If you see compilation messages like this, everything looks fine, but you might be in trouble.
g++ -std=gnu++11 -I"/usr/local/lib/R/include" -DNDEBUG -I. -I/usr/local/include -UDEBUG -DNDEBUG -DU_HAVE_ELF_H -w -fno-gnu-unique -fno-optimize-sibling-calls -DMKL_ILP64 -m64 -I/opt/intel/compilers_and_libraries_2019.5.281/linux/mkl/include -I/usr/include -msse2 -mfpmath=sse -g -O3 -fPIC -fPIC -w -fno-gnu-unique -fno-optimize-sibling-calls -DMKL_ILP64 -m64 -I/opt/intel/compilers_and_libraries_2019.5.281/linux/mkl/include -I/usr/include -msse2 -mfpmath=sse -g -O3 -c stri_trans_normalization.cpp -o stri_trans_normalization.o
The problem is that the sse and sse2 optimizations may work on one computer but not another. If you later see an error called “illegal instruction,” then you have to go back and install every package with optimization turned off. This means that R’s install won’t work by default. You need to customize installation for failing packages. And maybe tell Infra that you’d like a cluster that uses the same processors for all of the nodes.
Test that it loads with library(pr2ar)
.
Go the Hub and search for “R/RStudio Cluster Cheat Sheet”. Or try this link: https://hub.ihme.washington.edu/pages/viewpage.action?pageId=60485097
Then run
singRstudio -P proj_mmc
You’ll see something like this, where the last line tells you where to point your web browser.
$ singRstudio -P proj_mmc
--------------------------------------------------------------------------------
RStudio Qsub Script Info
----------------------------------------
Script Name : 'rstudio_qsub_script'
Version : 2.0.1
Number of threads set to: 2 (default)
Maximum memory set to : '1G' (default)
Queue set to : 'all.q' (default)
Runtime set to : '03:00:00:00' (default)
Cluster set to : 'prod' (default)
Project set to : 'proj_mmc'
Port set to : '7285' (based on user name)
Image type : LBD Team (default)
Image version set to : /share/singularity-images/lbd/rstudio/shells/lbd_rstudio_shells/rstudio_shell_9999_current.sh
Requesting archive nodes: TRUE (default)
Submitting RStudio job with the following command:
'qsub -N rstudio_ide -j y -o /share/temp/sgeoutput/adolgert/output -now no -P proj_mmc -l fthread=2 -l m_mem_free=1G -l h_rt=03:00:00:00 -q all.q -l archive=TRUE /share/singularity-images/lbd/rstudio/shells/lbd_rstudio_shells/rstudio_shell_9999_current.sh 7285'
Your job 40923023 ("rstudio_ide") has been submitted
http://gen-uge-archive-p207.cluster.ihme.washington.edu:7285
That last line is a URL. Point your web browser there, and you’ll see R.
Is it pointing to your installed libraries? Check for your library with
.libPaths()
I had a problem with the versions not matching, so I ran this to make them match.
.libPaths(c("/ihme/homes/adolgert/share/R3.6.1", "/usr/local/R-3.6.1/library"))
I finally found the post that leads to how you run Rstudio on the cluster using the Scicomp version:
Hi, RStudio singularity users, we just released 3.6.0.2 with bug fixes and new packages. Here is the release note: https://hub.ihme.washington.edu/pages/viewpage.action?spaceKey=DataScience&title=GBD+Rstudio+Singularity+Image+Release+Note. Since it's still in beta, the default image is still 3.6.0.1. If you want to use this image, please specify -i /ihme/singularity-images/rstudio/ihme_rstudio_3602-beta.img on your command. Example:
sh /ihme/singularity-images/rstudio/shells/jpy_rstudio_qsub_script.sh -i /ihme/singularity-images/rstudio/ihme_rstudio_3602-beta.img -t rstudio -P proj_burdenator -G w
Maybe that helps. The previous instructions are for using the geospatial version because they have instructions.
Sometimes the permissions aren’t right. These are commands that fix them. Note that the ihme-malaria group id is 700187
cd /ihme/malaria_modeling
find . -type d -user adolgert -group "Domain Users" -exec chown adolgert:ihme-malaria {} \;
find . -type f -user adolgert -group "Domain Users" -exec chown adolgert:ihme-malaria {} \;
find . -user adolgert -exec chmod ug+rwX {} \;
Modify the out_dir
in the gen_scaled_ar.R script so that it writes to a new location.
Get the qsub string by running a script to print it.
execRscript.sh print_launch.R
Then copy the qsub string and paste it into the command line. It will be something like this.
qsub -l m_mem_free=2.0G -l fthread=1 -l h_rt=24:00:00 -l archive=True -q all.q -cwd -P proj_mmc -e /share/temp/sgeoutput/adolgert/errors -o /share/temp/sgeoutput/adolgert/output -N gen_scaled_ar -t 1:13500 /homes/adolgert/bin/execRscript.sh /homes/adolgert//dev/analytics-pipeline/gen_scaled_ar/gen_scaled_ar.R
It isn’t running from inside R because SGE_ROOT
is defined as /opt/sge
which isn’t in the singularity image. Whatever. Run the command from the command line. Look at the job ID that’s returned. Use qstat
to see the jobs running. When they are done, use qacct -j <jobid> | grep exit_status | uniq
to see if they are all exit status 0.
The files should be in out_dir
from gen_scaled_ar.R, which is currently set to /ihme/malaria_modeling/projects/uganda2020/outputs/practice_draws/
. See that they were made and have the right owner and group and privileges, like this, so ihme-malaria owns them and they are rw for owner and group.
-rw-rw-r-- 1 adolgert ihme-malaria 6619 May 2 18:13 1_1.csv
Combine the output results by running another script. Edit this script first so that it knows where to put its outputs without overwriting.
execRscript.sh combine_pr_files.R
Do you want to know how long it took to run jobs? After they are finished, run
qacct -j gen_scaled_ar | grep ^wallclock | cut -d' ' -f5
This gives seconds for each run. It is about 2 hours currently, per draw.
The combined files are in a directory on the cluster, determined by the combine_pr_files.R
script. You can leave them there or zip and transfer them, but go into the plot
subdirectory to find make_maps.Rmd
in order to make maps from those plots. You’ll point a variable there at the location of the files.
If you want to run under Rstudio on the cluster, then create a ~/.Rprofile script and export R_PROFILE=$HOME/.Rprofile
in your .bashrc. That script should point to a library path that matches the current version of R in R singularity. That’s ~/share/R3.6.3
right now.
Run Rstudio with this qsub for 2 threads, 10GB of memory 12 hours, interactive queue. Yes, they changed the specifications from the way qsub handles them. Read the instructions and then try a bunch of times until it works.
/ihme/singularity-images/rstudio/shells/jpy_rstudio_qsub_script.sh -t rstudio -i /ihme/singularity-images/rstudio/ihme_rstudio_3630.img -f 2 -o 2 -m 10G -h 12 -P proj_mmc -q i -l /share/temp/sgeoutput/$USER/errors
When you run qsub, it will tell you the port number it will try to give you, although that can fail, of course. But then you qstat, which might look like this:
$ qstat
job-ID prior name user state submit/start at queue jclass slots ja-task-ID
------------------------------------------------------------------------------------------------------------------------------------------------
41622257 0.50026 QLOGIN adolgert r 05/10/2020 16:46:07 i.q@int-uge-archive-p005.clust 1
41624723 0.50023 rst_ide_20 adolgert r 05/10/2020 17:12:22 i.q@int-uge-archive-p004.clust 1
That tells me to connect my web browser to int-uge-archive-p004.cluster.ihme.washington.edu:6232, because that’s the port number associated with my name. It may be someone else’s too, but whatever.