I have a workflow that begins by downloading files from a public database, and then in subsequent steps processes these files to create several aggregated data tables.
I’m testing the workflow on a machine with no internet connection. I ran the preliminary data download steps on another machine and copied them over to this machine, and now I’m trying to run the rest of the workflow. When I run snakemake -np
it reports that all of the data download jobs still need to be completed, even though the target files already exist. I’ve even marked these files as ancient()
in the subsequent processing rules, but this doesn’t help.
How can I convince Snakemake that these jobs don’t need to be re-run?
A Snakemake workflow is defined by specifying rules in a Snakefile. Rules decompose the workflow into small steps (for example, the application of a single tool) by specifying how to create sets of output files from sets of input files.
If there are dependencies, I have found that only --until works if you want to run rule C just run snakemake -R --until c . If there are assumed dependencies, like shared input or output paths, it will force you to run the upstream rules without the use of --until . Always run first with -n for a dry-run.
This good habit of writing things out only once is known as the “Don't Repeat Yourself” principle or D.R.Y. {output} is a Snakemake wildcard which is equivalent to the value we specified for the rule output. {input} is another wildcard which means 'all the inputs of the current rule'.
Flag --reason
prints the reason for each executed rule.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With