Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Snakemake using a rule in a loop

I'm trying to use Snakemake rules within a loop so that the rule takes the output of the previous iteration as input. Is that possible and if yes how can I do that?

Here is my example

  1. Setup the test data
mkdir -p test
echo "SampleA" > test/SampleA.txt
echo "SampleB" > test/SampleB.txt
  1. Snakemake
SAMPLES = ["SampleA", "SampleB"]

rule all:
    input:
        # Output of the final loop
        expand("loop3/{sample}.txt", sample = SAMPLES)


#### LOOP ####
for i in list(range(1, 4)):
    # Setup prefix for input
    if i == 1:
        prefix = "test"
    else:
        prefix = "loop%s" % str(i-1)

    # Setup prefix for output
    opref =  "loop%s" % str(i)

    # Rule
    rule loop_rule:
        input:
            prefix+"/{sample}.txt"
        output:
            prefix+"/{sample}.txt"
            #expand("loop{i}/{sample}.txt", i = i, sample = wildcards.sample)
        params:
            add=prefix
        shell:
            "awk '{{print $0, {params.add}}}' {input} > {output}"

Trying to run the example yields the ERROR CreateRuleException in line 26 of /Users/fabiangrammes/Desktop/Projects/snake_loop/Snakefile: The name loop_rule is already used by another rule. If anyone spots an option to get that thing to work it would be great!

Thanks !

like image 597
Fabian_G Avatar asked May 23 '19 11:05

Fabian_G


1 Answers

I think this is a nice opportunity to use recursive programming. Rather than explicitly including conditionals for every iteration, write a single rule that transitions from iteration (n-1) to n. So, something along these lines:

SAMPLES = ["SampleA", "SampleB"]

rule all:
    input:
        expand("loop3/{sample}.txt", sample=SAMPLES)

def recurse_sample(wcs):
    n = int(wcs.n)
    if n == 1:
        return "test/%s.txt" % wcs.sample
    elif n > 1:
        return "loop%d/%s.txt" % (n-1, wcs.sample)
    else:
        raise ValueError("loop numbers must be 1 or greater: received %s" % wcs.n)

rule loop_n:
    input: recurse_sample
    output: "loop{n}/{sample}.txt"
    wildcard_constraints:
        sample="[^/]+",
        n="[0-9]+"
    shell:
        """
        awk -v loop='loop{wildcards.n}' '{{print $0, loop}}' {input} > {output}
        """

As @RussHyde said, you need to be proactive about ensuring no infinite loops are triggered. To this end, we ensure all cases are covered in recurse_sample and use wildcard_constraints to make sure the matching is precise.

like image 146
merv Avatar answered Oct 20 '22 04:10

merv