I am trying to make a simple pipeline using snakemake to download two files from the web and then merge them into a single output.
What I thought would work is the following code:
dwn_lnks = {
'1': 'https://molb7621.github.io/workshop/_downloads/sample.fa',
'2': 'https://molb7621.github.io/workshop/_downloads/sample.fa'
}
import os
# association between chromosomes and their links
def chromo2link(wildcards):
return dwn_lnks[wildcards.chromo]
rule all:
input:
os.path.join('genome_dir', 'human_en37_sm.fa')
rule download:
output:
expand(os.path.join('chr_dir', '{chromo}')),
params:
link=chromo2link,
shell:
"wget {params.link} -O {output}"
rule merger:
input:
expand(os.path.join('chr_dir', "{chromo}"), chromo=dwn_lnks.keys())
output:
os.path.join('genome_dir', 'human_en37_sm.fa')
run:
txt = open({output}, 'a+')
with open (os.path.join('chr_dir', "{chromo}") as file:
line = file.readline()
while line:
txt.write(line)
line = file.readline()
txt.close()
This code returns the error:
No values given for wildcard 'chromo'. in line 20
Also, in the merger rule, the python code within the run does not work.
The tutorial in the snakemake package does not cover enough examples to learn the details for non-computer scientists. If anybody knows a good resource to learn how to work with snakemake, I would appreciate if they could share :).
The problem is that you have an expand function in the output of the rule download that does not define the value for the wildcard {chromo}. I guess what you really want here is
rule download:
output:
'chr_dir/{chromo}',
params:
link=chromo2link,
shell:
"wget {params.link} -o {output}"
without the expand. The expand function is only needed to aggregate over wildcards, like you do it in the rule merger.
Also have a look at the official Snakemake tutorial, which explains this in detail.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With