Multiple inputs and outputs in a single rule Snakemake file

Tags:

snakemake

I am getting started with Snakemake and I have a very basic question which I couldnt find the answer in snakemake tutorial.

I want to create a single rule snakefile to download multiple files in linux one by one. The 'expand' can not be used in the output because the files need to be downloaded one by one and wildcards can not be used because it is the target rule.

The only way comes to my mind is something like this which doesnt work properly. I can not figure out how to send the downloaded items to specific directory with specific names such as 'downloaded_files.dwn' using {output} to be used in later steps:

links=[link1,link2,link3,....]
rule download:    
output: 
    "outdir/{downloaded_file}.dwn"
params: 
    shellCallFile='callscript',
run: 
    callString=''
    for item in links:
        callString+='wget str(item) -O '+{output}+'\n'
    call('echo "' + callString + '\n" >> ' + params.shellCallFile, shell=True)
    call(callString, shell=True)

I appreciate any hint on how this should be solved and which part of snakemake I didnt understand well.

543

asked Jun 15 '17 07:06

user3015703

1 Answers

Here is a commented example that could help you solve your problem:

# Create some way of associating output files with links
# The output file names will be built from the keys: "chain_{key}.gz"
# One could probably directly use output file names as keys 
links = {
    "1" : "http://hgdownload.cse.ucsc.edu/goldenPath/hg38/liftOver/hg38ToAptMan1.over.chain.gz",
    "2" : "http://hgdownload.cse.ucsc.edu/goldenPath/hg38/liftOver/hg38ToAquChr2.over.chain.gz",
    "3" : "http://hgdownload.cse.ucsc.edu/goldenPath/hg38/liftOver/hg38ToBisBis1.over.chain.gz"}


rule download:
    output:
        # We inform snakemake that this rule will generate
        # the following list of files:
        # ["outdir/chain_1.gz", "outdir/chain_2.gz", "outdir/chain_3.gz"]
        # Note that we don't need to use {output} in the "run" or "shell" part.
        # This list will be used if we later add rules
        # that use the files generated by the present rule.
        expand("outdir/chain_{n}.gz", n=links.keys())
    run:
        # The sort is there to ensure the files are in the 1, 2, 3 order.
        # We could use an OrderedDict if we wanted an arbitrary order.
        for link_num in sorted(links.keys()):
            shell("wget {link} -O outdir/chain_{n}.gz".format(link=links[link_num], n=link_num))

And here is another way of doing, that uses arbitrary names for the downloaded files and uses output (although a bit artificially):

links = [
    ("foo_chain.gz", "http://hgdownload.cse.ucsc.edu/goldenPath/hg38/liftOver/hg38ToAptMan1.over.chain.gz"),
    ("bar_chain.gz", "http://hgdownload.cse.ucsc.edu/goldenPath/hg38/liftOver/hg38ToAquChr2.over.chain.gz"),
    ("baz_chain.gz", "http://hgdownload.cse.ucsc.edu/goldenPath/hg38/liftOver/hg38ToBisBis1.over.chain.gz")]


rule download:
    output:
        # We inform snakemake that this rule will generate
        # the following list of files:
        # ["outdir/foo_chain.gz", "outdir/bar_chain.gz", "outdir/baz_chain.gz"]
        ["outdir/{f}".format(f=filename) for (filename, _) in links]
    run:
        for i in range(len(links)):
            # output is a list, so we can access its items by index
            shell("wget {link} -O {chain_file}".format(
                link=links[i][1], chain_file=output[i]))
        # using a direct loop over the pairs (filename, link)
        # could be considered "cleaner"
        # for (filename, link) in links:
        #     shell("wget {link} -0 outdir/{filename}".format(
        #         link=link, filename=filename))

An example where the three downloads can be done in parallel using snakemake -j 3:

# To use os.path.join,
# which is more robust than manually writing the separator.
import os

# Association between output files and source links
links = {
    "foo_chain.gz" : "http://hgdownload.cse.ucsc.edu/goldenPath/hg38/liftOver/hg38ToAptMan1.over.chain.gz",
    "bar_chain.gz" : "http://hgdownload.cse.ucsc.edu/goldenPath/hg38/liftOver/hg38ToAquChr2.over.chain.gz",
    "baz_chain.gz" : "http://hgdownload.cse.ucsc.edu/goldenPath/hg38/liftOver/hg38ToBisBis1.over.chain.gz"}


# Make this association accessible via a function of wildcards
def chainfile2link(wildcards):
    return links[wildcards.chainfile]


# First rule will drive the rest of the workflow
rule all:
    input:
        # expand generates the list of the final files we want
        expand(os.path.join("outdir", "{chainfile}"), chainfile=links.keys())


rule download:
    output:
        # We inform snakemake what this rule will generate
        os.path.join("outdir", "{chainfile}")
    params:
        # using a function of wildcards in params
        link = chainfile2link,
    shell:
        """
        wget {params.link} -O {output}
        """

162

answered Sep 24 '22 23:09

bli

Related questions
                            
                                python3: Don't show full directory path on error message
                            
                                PyQt4 script frozen by PyInstaller gives Fatal Error:"Failed to execute script xyz"
                            
                                Why are my input/output processors in Scrapy not working?
                            
                                How to call syscall readahead in Python?
                            
                                Why is Python 3 looking in my Python 2.7 package directory for packages?
                            
                                Storing pure python datetime.datetime in pandas DataFrame
                            
                                How to decode a numpy array of dtype=numpy.string_?
                            
                                Python 3 statsmodels Logit ValueError: On entry to DLASCL parameter number 5 had an illegal value
                            
                                Circular imports in python3
                            
                                UnicodeEncodeError in python3
                            
                                Datastax Python cassandra driver build fails on Ubuntu
                            
                                Is directly accessing class attribute faster than getting the value via a getter function?
                            
                                Using pandas DataFrame.eval function to alter subset of rows inplace
                            
                                Advanced input in python
                            
                                Matplotlib 2 inconsistent font
                            
                                Lowlevel introspection in python3?
                            
                                Maximum Common Subgraph in a Directed Graph
                            
                                How can I run the aws-cli in an AWS Lambda Python 3.6 environment?
                            
                                Importing classes/functions with same name as module
                            
                                How to attach a pdf file to a MIME email in Python?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Multiple inputs and outputs in a single rule Snakemake file

Tags:

python-3.x

snakemake

user3015703

People also ask

1 Answers

bli

Recent Activity

Donate For Us