Apologies that the title is bad - I can't figure out how best to explain my issue in a few words. I'm having trouble dealing with downstream rules in snakemake when one of the rules fails. In the example below, rule spades fails on some samples. This is expected because some of my input files will have issues, spades will return an error, and the target file is not generated. This is fine until I get to rule eval_ani. Here I basically want to run this rule on all of the successful output of rule ani. But I'm not sure how to do this because I have effectively dropped some of my samples in rule spades. I think using snakemake checkpoints might be useful but I just can't figure out how to apply it from the documentation.
I'm also wondering if there is a way to re-run rule ani without re-running rule spades. Say I prematurely terminated my run, and rule ani didn't run on all the samples. Now I want to re-run my pipeline, but I don't want snakemake to try to re-run all the failed spades jobs because I already know they won't be useful to me and it would just waste resources. I tried -R and --allowed-rules but neither of these does what I want.
rule spades:
input:
read1=config["fastq_dir"]+"combined/{sample}_1_combined.fastq",
read2=config["fastq_dir"]+"combined/{sample}_2_combined.fastq"
output:
contigs=config["spades_dir"]+"{sample}/contigs.fasta",
scaffolds=config["spades_dir"]+"{sample}/scaffolds.fasta"
log:
config["log_dir"]+"spades/{sample}.log"
threads: 8
shell:
"""
python3 {config[path_to_spades]} -1 {input.read1} -2 {input.read2} -t 16 --tmp-dir {config[temp_dir]}spades_test -o {config[spades_dir]}{wildcards.sample} --careful > {log} 2>&1
"""
rule ani:
input:
config["spades_dir"]+"{sample}/scaffolds.fasta"
output:
"fastANI_out/{sample}.txt"
log:
config["log_dir"]+"ani/{sample}.log"
shell:
"""
fastANI -q {input} --rl {config[reference_dir]}ref_list.txt -o fastANI_out/{wildcards.sample}.txt
"""
rule eval_ani:
input:
expand("fastANI_out/{sample}.txt", sample=samples)
output:
"ani_results.txt"
log:
config["log_dir"]+"eval_ani/{sample}.log"
shell:
"""
python3 ./bin/evaluate_ani.py {input} {output} > {log} 2>&1
"""
If I understand correctly, you want to allow spades to fail without stopping the whole pipeline and you want to ignore the output files from spades that failed. For this you could append to the command running spades || true
to catch the non-zero exit status (so snakemake will not stop). Then you could analyse the output of spades and write to a "flag" file whether that sample succeded or not. So the rule spades would be something like:
rule spades:
input:
read1=config["fastq_dir"]+"combined/{sample}_1_combined.fastq",
read2=config["fastq_dir"]+"combined/{sample}_2_combined.fastq"
output:
contigs=config["spades_dir"]+"{sample}/contigs.fasta",
scaffolds=config["spades_dir"]+"{sample}/scaffolds.fasta",
exit= config["spades_dir"]+'{sample}/exit.txt',
log:
config["log_dir"]+"spades/{sample}.log"
threads: 8
shell:
"""
python3 {config[path_to_spades]} ... || true
# ... code that writes to {output.exit} stating whether spades succeded or not
"""
For the following steps, you use the flag files '{sample}/exit.txt'
to decide which spade files should be used and which should be discarded. For example:
rule ani:
input:
spades= config["spades_dir"]+"{sample}/scaffolds.fasta",
exit= config["spades_dir"]+'{sample}/exit.txt',
output:
"fastANI_out/{sample}.txt"
log:
config["log_dir"]+"ani/{sample}.log"
shell:
"""
if {input.exit} contains 'PASS':
fastANI -q {input.spades} --rl {config[reference_dir]}ref_list.txt -o fastANI_out/{wildcards.sample}.txt
else:
touch {output}
"""
rule eval_ani:
input:
ani= expand("fastANI_out/{sample}.txt", sample=samples),
exit= expand(config["spades_dir"]+'{sample}/exit.txt', sample= samples),
output:
"ani_results.txt"
log:
config["log_dir"]+"eval_ani/{sample}.log"
shell:
"""
# Parse list of file {input.exit} to decide which files in {input.ani} should be used
python3 ./bin/evaluate_ani.py {input} {output} > {log} 2>&1
"""
EDIT (not tested) Instead of || true
inside the shell
directive it may be better to use the run
directive and use python's subprocess
to run the system commands that are allowed to fail. The reason is that || true
will return 0 exit code no matter what error happened; the subprocess
solution instead allows more precise handling of exceptions. E.g.
rule spades:
input:
...
output:
...
run:
cmd = "spades ..."
p = subprocess.Popen(cmd, shell= True, stdout= subprocess.PIPE, stderr= subprocess.PIPE)
stdout, stderr= p.communicate()
if p.returncode == 0:
print('OK')
else:
# Analyze exit code and stderr and decide what to do next
print(p.returncode)
print(stderr.decode())
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With