Is Snakemake params function evaluated before input file existence?

Question

Consider this snakefile:

def rdf(fn):
    f = open(fn, "rt")
    t = f.readlines()
    f.close()
    return t

rule a:
    output: "test.txt"
    input: "test.dat"
    params: X=lambda wildcards, input, output, threads, resources: rdf(input[0])
    message: "X is {params.X}"
    shell: "cp {input} {output}"

rule b:
    output: "test.dat"
    shell: "echo 'hello world' >{output}"

When run and neither test.txt nor test.dat exists, it gives this error:

InputFunctionException in line 7 of /Users/tedtoal/Documents/BioinformaticsConsulting/Mars/Cacao/Pipeline/SnakeMake/t2:
FileNotFoundError: [Errno 2] No such file or directory: 'test.dat'

However, if test.dat exists, it runs fine. Why?

I would have expected params not be be evaluated until snakemake was ready to run rule 'a'. Instead, it must call the params function rdf() above during DAG phase prior to running rule 'a'. And yet the following works, even when test.dat does not exist initially:

import os

def rdf(fn):
    if not os.path.exists(fn): return ""
    f = open(fn, "rt")
    t = f.readlines()
    f.close()
    return t

rule a:
    output: "test.txt"
    input: "test.dat"
    params: X=lambda wildcards, input, output, threads, resources: rdf(input[0])
    message: "X is {params.X}"
    shell: "cp {input} {output}"

rule b:
    output: "test.dat"
    shell: "echo 'hello world' >{output}"

This implies that the params are evaluated twice, once during DAG phase and once during rule execution phase. Why?

This is a problem for me. I need to be able to read data from an input file to the rule, to formulate arguments for the program to be executed. The command does not receive the input filename itself, instead it gets arguments derived from the contents of the input file. I can handle it as above, but this seems klugey, and I wonder if there is a bug or I'm missing something?

Scholar · Accepted Answer

I had the same issue. In my case, I could circumvent the problem by letting the function return a placeholder default when running on non-existing files.

For example, I have a rule which needs to know the number of lines of some of it's input files ahead of time. Therefore, I used:

def count_lines(bed):
    # This is neccessary, because in a dry-run, snakemake will evaluate the 'params' 
    # directive in the (potentiall non-existing) input files. 
    if not Path(bed).exists():
        return -1

    total = 0
    with open(bed) as f:
        for line in f:
            total += 1
    return total

rule subsample_background:
    input:        
        one = "raw/{A}/file.txt",
        two = "raw/{B}/file.txt"
    output:
        "processed/some_output.txt"
    params:
        n = lambda wildcards, input: count_lines(input.one)

    shell:
        "run.sh -n {params.n} {input.B} > {output}"

In the dry-run, a placeholder -1 will be placed, allowing the dry-run to "complete" successfully, while in the non-dry-run, the function will return the appropriate value.

Is Snakemake params function evaluated before input file existence?

Tags:

parameters

snakemake

tedtoal

1 Answers

Scholar

Recent Activity

Donate For Us

Is Snakemake params function evaluated before input file existence?

Tags:

parameters

snakemake

tedtoal

1 Answers

Scholar

Related questions

Recent Activity

Donate For Us