I'm putting together a snakemake slurm workflow and am having trouble with my working directory becoming cluttered with slurm output files. I would like my workflow to, at a minimum, direct these files to a 'slurm' directory inside my working directory. I currently have my workflow set up as follows:
config.yaml:
reads:
1:
2:
samples:
15FL1-2: /datasets/work/AF_CROWN_RUST_WORK/2020-02-28_GWAS/data/15FL1-2
15Fl1-4: /datasets/work/AF_CROWN_RUST_WORK/2020-02-28_GWAS/data/15Fl1-4
cluster.yaml:
localrules: all
__default__:
time: 0:5:0
mem: 1G
output: _{rule}_{wildcards.sample}_%A.slurm
fastqc_raw:
job_name: sm_fastqc_raw
time: 0:10:0
mem: 1G
output: slurm/_{rule}_{wildcards.sample}_{wildcards.read}_%A.slurm
Snakefile:
configfile: "config.yaml"
workdir: config["work"]
rule all:
input:
expand("analysis/fastqc_raw/{sample}_R{read}_fastqc.html", sample=config["samples"],read=config["reads"])
rule clean:
shell:
"rm -rf analysis logs"
rule fastqc_raw:
input:
'data/{sample}_R{read}.fastq.gz'
output:
'analysis/fastqc_raw/{sample}_R{read}_fastqc.html'
log:
err = 'logs/fastqc_raw/{sample}_R{read}.out',
out = 'logs/fastqc_raw/{sample}_R{read}.err'
shell:
"""
fastqc {input} --noextract --outdir 'analysis/fastqc_raw' 2> {log.err} > {log.out}
"""
I then call with:
snakemake --jobs 4 --cluster-config cluster.yaml --cluster "sbatch --mem={cluster.mem} --time={cluster.time} --job-name={cluster.job_name} --output={cluster.output}"
This does not work, as the slurm
directory does not already exist. I don't want to manually make this before running my snakemake command, that will not work for scalability. Things I've tried, after reading every related question, are:
1) simply trying to capture all the output via the log within the rule, and setting cluster.output='/dev/null'
. Doesn't work, the info in the slurm output isn't captured as it's not output of the rule exactly, its info on the job
2) forcing the directory to be created by adding a dummy log:
log:
err = 'logs/fastqc_raw/{sample}_R{read}.out',
out = 'logs/fastqc_raw/{sample}_R{read}.err'
jobOut = 'slurm/out.err'
I think this doesn't work because sbatch tries to find the slurm folder before implementing the rule
3) allowing the files to be made in the working directory, and adding bash code to the end of the rule to move the files into a slurm directory. I believe this doesn't work because it tries to move the files before the job has finished writing to the slurm output.
Any further ideas or tricks?
You should be able to suppress these outputs by calling sbatch
with --output=/dev/null --error=/dev/null
. Something like this:
snakemake ... --cluster "sbatch --output=/dev/null --error=/dev/null ..."
If you want the files to go to a directory of your choosing you can of course change the call to reflect that:
snakemake ... --cluster "sbatch --output=/home/Ensa/slurmout/%j.out --error=/home/Ensa/slurmout/%j.out ..."
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With