I have a rule that iterates over a file pulls out the Fastq file paths and runs trimGalore on the Fastq files. However some of the files are corrupted / truncated and so trimGalore fails to process them. It continues to run on remaining files but the overall rule fails and deletes the output folder with the successfully processed files too. How do I retain the output folder?
I tried altering the shell command to ignore exit status but snakemake seems to enforce set -euo pipefail
within a shell element of the run.
rule trimGalore:
"""
This module takes in the temporary file created by parse sampleFile rule and determines if libraries are single end or paired end.
The appropriate step for trimGalore is then ran and a summary of the runs is produced in summary_tg.txt
"""
input:
rules.parse_sampleFile.output[1]+"singleFile.txt", rules.parse_sampleFile.output[1]+"pairFile.txt"
output:
directory(projectDir+"/trimmed_reads/")
log:
projectDir+"/logs/"+stamp+"_trimGalore.log"
params:
p = trimGaloreParams
shell:
"""
(awk -F "," '{{print $2}}' {input[0]} |while read i; do echo $(date +"%Y-%m-%d %H:%M:%S") >>{log}; echo "$USER">>{log}; trim_galore {params.p} --gzip -o {output} $i; done
awk -F "," '{{print $2" "$3}}' {input[1]} |while read i; do echo $(date +"%Y-%m-%d %H:%M:%S") >>{log}; echo "$USER">>{log}; trim_galore --paired {params.p} --gzip -o {output} $i; done) 2>>{log}
"""
I am happy that it continues to process the remaining Fastq files if one fails but I want the rule output folder to be kept when the job finishes and fails. I want to continue to process the non truncated files
Currently, your rule considers the entire directory as it's output, so if any errors pop up along the way, it will consider the job as a whole failed and discard the output (i.e. your entire folder).
The solution I could think of would be related to this section of the Snakemake docs, and the one just below it on Functions as input.
def myfunc(wildcards):
return [... a list of input files depending on given wildcards ...]
rule:
input: myfunc
output: "someoutput.{somewildcard}.txt"
shell: "..."
With this you could try iterating over your file, and snakemake will create one job per Fastq, so in the event that individual job fails, only that output file will be removed.
Disclaimer: This is something I just learned and haven't tried yet, but it will be useful to me as well!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With