Using R in the HPC

This page details using R in the biostat partition of the KUHPC.

Setting up library paths

Using R in the HPC means having all the libraries installed. Rather than repeatedly installing the same libraries that are commonly used across the biostat partition, we encourage users to set their R_LIBS_SITE to use the shared installed packages and reduce the number that are needed to install in their own individual folder.

TipBenefit of shared R packages

Using the shared R package library means you don’t need to install commonly used packages in your own personal folder. This saves setup time, reduces redundant storage use on the partition, and gives you immediate access to a curated set of packages without any installation steps. This is a benefit to you personally in time saved and to the whole biostat partition in saving memory.

Use one of the two following options to use these shared R packages:

Using R_LIBS_SITE within current environment

To use the packages that are already loaded in the shared workspace for the biostat partition, users can set the environment variable called R_LIBS_SITE that will cause R once launched to search in that location for installed packages. This does not preclude your ability to download your own if it is not there, but it will allow you to use usual packages without having to reinstall all of them in your own personal folder. This saves you time and the partition storage. To do this, users must run this command in their environment every time they open a new terminal or connect to a new node.

export R_LIBS_SITE=/kuhpc/work/biostat/sw/R/4.4

Users can then load R and will have access to the shared packages. If you want to check in your R script, you can run the following command.

.libPaths()
WarningThis must be done each new login or node

Whether you are logging in again or entering a new node for a submitted job, it is necessary to again export the R_LIBS_SITE each time. Otherwise, the script will only look in your own personal library of R packages. If you are having errors due to missing packages, it is always worth checking the above command in R to see whether the library path to the shared R packages is loaded in your environment.

Setting R_LIBS_SITE permanently

Rather than running this command each time, users are encouraged to set this environment variable permanently in a profile file called the .bashrc. For more information about this type of file, here is a general overview of the file type.

Users only have to do this process one time. To put this environment variable in your .bashrc, run the following:

nano ~/.bashrc

which will open the file to edit. You will then paste the following in the file:

export R_LIBS_SITE=/kuhpc/work/biostat/sw/R/4.4

and then close and save the file. If you run cat ~/.bashrc you should see the line that you added. This means that you have successfully added this variable. For this one session, you will need to reload the file to set the environment variable using source ~/.bashrc or reload the terminal. After this time, it will automatically load this variable.

WarningImportant note for submitted jobs

If you set the environment variable in your .bashrc file in this way, you must call your file in the submitted shell script. This would look something like this:

#!/bin/bash
#SBATCH --job-name=test_job
#SBATCH --output=test_job_%j.out
#SBATCH --error=test_job_%j.err
#SBATCH --time=01:00:00
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=4
#SBATCH --mem=8G

source ~/.bashrc

module load R

Rscript your_r_script.R

Notice that the user uses the source ~/.bashrc command to ensure that the submitted job uses their saved environment variables, including where the R packages are located.