Consider this snakefile:
def rdf(fn):
f = open(fn, "rt")
t = f.readlines()
f.close()
return t
rule a:
output: "test.txt"
input: "test.dat"
params: X=lambda wildcards, input, output, threads, resources: rdf(input[0])
message: "X is {params.X}"
shell: "cp {input} {output}"
rule b:
output: "test.dat"
shell: "echo 'hello world' >{output}"
When run and neither test.txt nor test.dat exists, it gives this error:
InputFunctionException in line 7 of /Users/tedtoal/Documents/BioinformaticsConsulting/Mars/Cacao/Pipeline/SnakeMake/t2:
FileNotFoundError: [Errno 2] No such file or directory: 'test.dat'
However, if test.dat exists, it runs fine. Why?
I would have expected params not be be evaluated until snakemake was ready to run rule 'a'. Instead, it must call the params function rdf() above during DAG phase prior to running rule 'a'. And yet the following works, even when test.dat does not exist initially:
import os
def rdf(fn):
if not os.path.exists(fn): return ""
f = open(fn, "rt")
t = f.readlines()
f.close()
return t
rule a:
output: "test.txt"
input: "test.dat"
params: X=lambda wildcards, input, output, threads, resources: rdf(input[0])
message: "X is {params.X}"
shell: "cp {input} {output}"
rule b:
output: "test.dat"
shell: "echo 'hello world' >{output}"
This implies that the params are evaluated twice, once during DAG phase and once during rule execution phase. Why?
This is a problem for me. I need to be able to read data from an input file to the rule, to formulate arguments for the program to be executed. The command does not receive the input filename itself, instead it gets arguments derived from the contents of the input file. I can handle it as above, but this seems klugey, and I wonder if there is a bug or I'm missing something?
I had the same issue. In my case, I could circumvent the problem by letting the function return a placeholder default when running on non-existing files.
For example, I have a rule which needs to know the number of lines of some of it's input files ahead of time. Therefore, I used:
def count_lines(bed):
# This is neccessary, because in a dry-run, snakemake will evaluate the 'params'
# directive in the (potentiall non-existing) input files.
if not Path(bed).exists():
return -1
total = 0
with open(bed) as f:
for line in f:
total += 1
return total
rule subsample_background:
input:
one = "raw/{A}/file.txt",
two = "raw/{B}/file.txt"
output:
"processed/some_output.txt"
params:
n = lambda wildcards, input: count_lines(input.one)
shell:
"run.sh -n {params.n} {input.B} > {output}"
In the dry-run, a placeholder -1
will be placed, allowing the dry-run to "complete" successfully, while in the non-dry-run, the function will return the appropriate value.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With