Snakemake

Question

In snakemake, you can call external scripts like so:

rule NAME:
    input:
        "path/to/inputfile",
        "path/to/other/inputfile"
    output:
        "path/to/outputfile",
        "path/to/another/outputfile"
    script:
        "path/to/script.R"

This gives convenient access to an S4 object named snakemake inside the R script. Now in my case, I am running snakemake on a SLURM cluster, and I need to load R with module load R/3.6.0 before an Rscript can be executed, otherwise the job will return:

/usr/bin/bash: Rscript: command not found

How can I tell snakemake to do that? If I run the rule as a shell instead of a script, my R script unfortunately has no access to the snakemake object, so this is no desired solution:

shell:
    "module load R/3.6.0;"
    "Rscript path/to/script.R"

Eric C. · Accepted Answer

You cannot call a shell command using the script tag. You definitely have to use the shell tag. You can always add your inputs and outputs as arguments:

rule NAME:
    input:
        in1="path/to/inputfile",
        in2="path/to/other/inputfile"
    output:
        out1="path/to/outputfile",
        out2="path/to/another/outputfile"
    shell:
        """
        module load R/3.6.0
        Rscript path/to/script.R {input.in1} {input.in2} {output.out1} {output.out2}
        """

and get your arguments in the R script:

args=commandArgs(trailingOnly=TRUE)
inFile1=args[1]
inFile2=args[2]
outFile1=args[3]
outFile2=args[4]

Use of conda environment:

You can specify a conda environment to use for a specific rule:

rule NAME:
    input:
        in1="path/to/inputfile",
        in2="path/to/other/inputfile"
    output:
        out1="path/to/outputfile",
        out2="path/to/another/outputfile"
    conda: "r.yml"
    script:
        "path/to/script.R"

and in you r.yml file:

name: rEnv
channels:
  - r
dependencies:
  - r-base=3.6

Then when you run snakemake:

snakemake .... --use-conda

Snakemake will install all environments prior to running and each environment will be activated inside the job sent to slurm.

Snakemake - load cluster modules before an external script is called

Tags:

python

r

cluster-computing

bgbrink

1 Answers

Eric C.

Recent Activity

Donate For Us