Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Snakemake wants to run job although output file already exists

I have a workflow that begins by downloading files from a public database, and then in subsequent steps processes these files to create several aggregated data tables.

I’m testing the workflow on a machine with no internet connection. I ran the preliminary data download steps on another machine and copied them over to this machine, and now I’m trying to run the rest of the workflow. When I run snakemake -np it reports that all of the data download jobs still need to be completed, even though the target files already exist. I’ve even marked these files as ancient() in the subsequent processing rules, but this doesn’t help.

How can I convince Snakemake that these jobs don’t need to be re-run?

like image 925
Daniel Standage Avatar asked Sep 19 '18 17:09

Daniel Standage


People also ask

How does Snakemake work?

A Snakemake workflow is defined by specifying rules in a Snakefile. Rules decompose the workflow into small steps (for example, the application of a single tool) by specifying how to create sets of output files from sets of input files.

How do you run a specific rule in Snakemake?

If there are dependencies, I have found that only --until works if you want to run rule C just run snakemake -R --until c . If there are assumed dependencies, like shared input or output paths, it will force you to run the upstream rules without the use of --until . Always run first with -n for a dry-run.

What are wildcards in Snakemake?

This good habit of writing things out only once is known as the “Don't Repeat Yourself” principle or D.R.Y. {output} is a Snakemake wildcard which is equivalent to the value we specified for the rule output. {input} is another wildcard which means 'all the inputs of the current rule'.


1 Answers

Flag --reason prints the reason for each executed rule.

like image 142
Manavalan Gajapathy Avatar answered Nov 03 '22 00:11

Manavalan Gajapathy