Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Snakemake slurm ouput file redirect to new directory

I'm putting together a snakemake slurm workflow and am having trouble with my working directory becoming cluttered with slurm output files. I would like my workflow to, at a minimum, direct these files to a 'slurm' directory inside my working directory. I currently have my workflow set up as follows:

config.yaml:

reads:
    1:
    2:
samples:
    15FL1-2: /datasets/work/AF_CROWN_RUST_WORK/2020-02-28_GWAS/data/15FL1-2
    15Fl1-4: /datasets/work/AF_CROWN_RUST_WORK/2020-02-28_GWAS/data/15Fl1-4

cluster.yaml:

localrules: all

__default__:
    time: 0:5:0
    mem: 1G
    output: _{rule}_{wildcards.sample}_%A.slurm

fastqc_raw:
    job_name: sm_fastqc_raw
    time: 0:10:0
    mem: 1G
    output: slurm/_{rule}_{wildcards.sample}_{wildcards.read}_%A.slurm

Snakefile:

configfile: "config.yaml"
workdir: config["work"]

rule all:
    input:
        expand("analysis/fastqc_raw/{sample}_R{read}_fastqc.html", sample=config["samples"],read=config["reads"])

rule clean:
    shell:
        "rm -rf analysis logs"

rule fastqc_raw:
    input:
        'data/{sample}_R{read}.fastq.gz'
    output:
        'analysis/fastqc_raw/{sample}_R{read}_fastqc.html'
    log:
        err = 'logs/fastqc_raw/{sample}_R{read}.out',
        out = 'logs/fastqc_raw/{sample}_R{read}.err'
    shell:
        """
        fastqc {input} --noextract --outdir 'analysis/fastqc_raw' 2> {log.err} > {log.out}
        """

I then call with:

snakemake --jobs 4  --cluster-config cluster.yaml --cluster "sbatch --mem={cluster.mem} --time={cluster.time} --job-name={cluster.job_name} --output={cluster.output}"

This does not work, as the slurm directory does not already exist. I don't want to manually make this before running my snakemake command, that will not work for scalability. Things I've tried, after reading every related question, are:

1) simply trying to capture all the output via the log within the rule, and setting cluster.output='/dev/null'. Doesn't work, the info in the slurm output isn't captured as it's not output of the rule exactly, its info on the job

2) forcing the directory to be created by adding a dummy log:

    log:
        err = 'logs/fastqc_raw/{sample}_R{read}.out',
        out = 'logs/fastqc_raw/{sample}_R{read}.err'
        jobOut = 'slurm/out.err'

I think this doesn't work because sbatch tries to find the slurm folder before implementing the rule

3) allowing the files to be made in the working directory, and adding bash code to the end of the rule to move the files into a slurm directory. I believe this doesn't work because it tries to move the files before the job has finished writing to the slurm output.

Any further ideas or tricks?

like image 370
Ensa Avatar asked Nov 07 '22 07:11

Ensa


1 Answers

You should be able to suppress these outputs by calling sbatch with --output=/dev/null --error=/dev/null. Something like this:

snakemake ... --cluster "sbatch --output=/dev/null --error=/dev/null ..."

If you want the files to go to a directory of your choosing you can of course change the call to reflect that:

snakemake ... --cluster "sbatch --output=/home/Ensa/slurmout/%j.out --error=/home/Ensa/slurmout/%j.out ..."
like image 106
Maarten-vd-Sande Avatar answered Dec 18 '22 23:12

Maarten-vd-Sande