<p>When a file that was made earlier in the pipeline is removed, SnakeMake does not seem to consider that a problem, as long as later files are there:</p> <pre class="prettyprint"><code>rule All: input: "testC1.txt", "testC2.txt" rule A: input: "{X}{Y}.txt" output: "{X}A{Y}.txt" shell: "cp {input} {output}" rule B: input: "{X}A{Y}.txt" output: "{X}B{Y}.txt" shell: "cp {input} {output}" rule C: input: "{X}B{Y}.txt" output: "{X}C{Y}.txt" shell: "cp {input} {output}" </code></pre> <p>Save this SnakeFile in test.sf and do this:</p> <pre class="prettyprint"><code>rm testA*.txt testB*.txt testC*.txt echo "test1" >test1.txt echo "test2" >test2.txt snakemake -s test.sf # Rerun: snakemake -s test.sf # SnakeMake says all is up to date, which it is. # Remove intermediate results: rm testA1.txt # Rerun: snakemake -s test.sf </code></pre> <p>SnakeMake says all is up to date. It does not detect missing testA1.txt.</p> <p>I seem to recall something in the online SnakeMake manual about this, but I can no longer find it.</p> <p>I assume this is expected SnakeMake behavior. It can sometimes be desired behavior, but sometimes you may want it to detect and rebuild the missing file. How can this be done?</p>

<p>As mentioned in this other answer, the <code>-R</code> parameter can help, but there are more options:</p> <h3>Force a rebuild of the whole workflow</h3> <p>When you call</p> <pre class="prettyprint"><code>snakemake -F </code></pre> <p>this will trigger a rebuild of the whole pipeline. This basically means, forget all intermediate files and start anew. This will definitely (re-) generate all intermediate files on the way. The downside is: it might take some time.</p> <h3>Force a specific rule</h3> <p>This is the realm of the <code>-R <rule></code> parameter. This re-runs the given rule and all rules that depend on it. So in your case</p> <pre class="prettyprint"><code>snakemake -R A -s test.sf </code></pre> <p>would rerun rule A (to build <code>testA1.txt</code> from <code>test.txt</code>) and the rules B, C and All, since they depend on A. Mind that this runs all copies of rule A that are required, so in your example <code>testA2.txt</code> and everything that follows from it is also rebuild.</p> <p>If, in your example, you would have removed <code>testB1.txt</code> instead, only the rules <code>B</code> and <code>C</code> would have been rerun.</p> <h3>Why does this happen?</h3> <p>If I remember correctly, snakemake detects if a file needs to be rebuild by its utime. So if you have a version of <code>testA1.txt</code> that is younger (as in more recently created) than <code>testB1.txt</code>, <code>testB1.txt</code> has to be rebuild using <code>rule B</code>, to assure everything is up to date. Hence, you cannot easily rebuild only <code>testA1.txt</code> without also building all following files unless you somehow change the files' utimes.</p> <p>I have not tried this out, but this can be done with snakemakes <code>--touch</code> parameter. If you manage to only run rule <code>A</code> and then run <code>snakemake -R B -t</code> ,which touches all output files of the rules <code>B</code> and following, you could get a valid workflow state without actually rerunning all steps in between.</p>

Can SnakeMake be forced to rerun rules when files are missing

Tags:

delete-file

snakemake

When a file that was made earlier in the pipeline is removed, SnakeMake does not seem to consider that a problem, as long as later files are there:

rule All:
    input: "testC1.txt", "testC2.txt"

rule A:
    input: "{X}{Y}.txt"
    output: "{X}A{Y}.txt"
    shell: "cp {input} {output}"

rule B:
    input: "{X}A{Y}.txt"
    output: "{X}B{Y}.txt"
    shell: "cp {input} {output}"

rule C:
    input: "{X}B{Y}.txt"
    output: "{X}C{Y}.txt"
    shell: "cp {input} {output}"

Save this SnakeFile in test.sf and do this:

rm testA*.txt testB*.txt testC*.txt
echo "test1" >test1.txt
echo "test2" >test2.txt
snakemake -s test.sf
# Rerun:
snakemake -s test.sf
# SnakeMake says all is up to date, which it is.
# Remove intermediate results:
rm testA1.txt
# Rerun:
snakemake -s test.sf

SnakeMake says all is up to date. It does not detect missing testA1.txt.

I seem to recall something in the online SnakeMake manual about this, but I can no longer find it.

I assume this is expected SnakeMake behavior. It can sometimes be desired behavior, but sometimes you may want it to detect and rebuild the missing file. How can this be done?

692

asked Aug 31 '17 20:08

tedtoal

2 Answers

As mentioned in this other answer, the -R parameter can help, but there are more options:

Force a rebuild of the whole workflow

When you call

snakemake -F

this will trigger a rebuild of the whole pipeline. This basically means, forget all intermediate files and start anew. This will definitely (re-) generate all intermediate files on the way. The downside is: it might take some time.

Force a specific rule

This is the realm of the -R <rule> parameter. This re-runs the given rule and all rules that depend on it. So in your case

snakemake -R A -s test.sf

would rerun rule A (to build testA1.txt from test.txt) and the rules B, C and All, since they depend on A. Mind that this runs all copies of rule A that are required, so in your example testA2.txt and everything that follows from it is also rebuild.

If, in your example, you would have removed testB1.txt instead, only the rules B and C would have been rerun.

Why does this happen?

If I remember correctly, snakemake detects if a file needs to be rebuild by its utime. So if you have a version of testA1.txt that is younger (as in more recently created) than testB1.txt, testB1.txt has to be rebuild using rule B, to assure everything is up to date. Hence, you cannot easily rebuild only testA1.txt without also building all following files unless you somehow change the files' utimes.

I have not tried this out, but this can be done with snakemakes --touch parameter. If you manage to only run rule A and then run snakemake -R B -t ,which touches all output files of the rules B and following, you could get a valid workflow state without actually rerunning all steps in between.

102

answered Oct 15 '22 12:10

m00am

I found this thread a while ago about the --forcerun/-R parameter that might be informative.

Ultimately, snakemake will force execution of the entire pipeline if you want to regenerate that intermediate file without having a separate rule for it or including it as a target in all.

answered Oct 15 '22 13:10

Jon Chung

Related questions
                            
                                Can I safely remove .BAK files from mysql database?
                            
                                Freeing up space in my SVN repository
                            
                                Is File.Delete() atomic under .NET
                            
                                How to programatically delete shortcut from user's desktop?
                            
                                How dangerous is this bash script?
                            
                                Python Deleting Certain File Extensions
                            
                                Deleting File and Directory in JUnit
                            
                                How to delete all files in a directory using batch? [closed]
                            
                                How to delete multiple files at once using Google Drive API
                            
                                File.delete() does not completely delete image blank image file left behind
                            
                                How to delete all "Values" in RStudio Environment?
                            
                                PHP delete the contents of a directory
                            
                                How to check if I can delete a file?
                            
                                How do I delete all files in an Azure File Storage folder?
                            
                                java file.delete() won't work
                            
                                Powershell Get-ChildItem -recurse doesn't get all items
                            
                                Deleting Locked Files & Folders
                            
                                How is it possible to understand which process deletes a file on the hard drive
                            
                                How to delete a file such that the delete is irreversable?
                            
                                Renaming Hidden and System files Command

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With