Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

snakemake: is there a way to specify an output directory for each rule?

Tags:

snakemake

The scripts I used all put the output files to the current directory where the script was called so in my shell script pipeline I would have cd commands to go to a particular directory to run commands and output files will just be saved in relevant directories. My scripts don't have the parameter for output directory and most of them get the output file names deduced from the input. That has worked pretty well for me.

Now I'm running into this output directory issue consistently as snakemake seem to output the files to the directory where Snakefile is. I could modify all the scripts to take in an additional parameter for output directory but that's gone be a pain for modifying many scripts. I'm wondering if there is any way to specify where the output should go for each specific rule?

like image 358
olala Avatar asked Jan 04 '23 23:01

olala


2 Answers

One hack would be to first cd into the output directory, i.e. "cd $(dirname {output[0]})". This needs to be the first in your shell commands.

Having said this, it would be better to change the script to accept an output directory as argument.

Andreas

like image 160
Andreas Avatar answered Feb 19 '23 06:02

Andreas


Here is an example rule that I use in one of my snakefiles:

rule link_raw_data:
    output:
        OPJ(data_dir, "{lib}_{rep}.fastq.gz"),
    params:
        directory = data_dir,
        shell_command = lib2data,
    message:
        "Making link to raw data {output}."
    shell:
        """
        (
        cd {params.directory}
        {params.shell_command}
        )
        """

This is probably a bit different from your situation, but hopefully some of the techniques can help. In particular, note the parentheses in the shell section and the usage of a params section to define the output directory.

I'm not sure I'm doing this in the most elegant way, but it works.

data_dir is a parameter read from a config file.

lib2data is a function that generates commands based on the values of some wildcards. I have to ensure that these commands use the correct input file paths of course (and, in this case, also the output in a coherent manner with what the output section says). In your case, it is possible that you will simply have a "hard-coded" shell commands, possibly using some of the rule's input.

More streamlined example

rule run_script1:
    input:
        path/to/initial/input
    output:
        script1_out/output1
    shell:
        """"
        cd script1_out
        script1 {input}
        """"

rule run_script2:
    input:
        script1/output1
    output:
        script2/output2
    shell:
        """
        cd script2_out
        script2 {input}
        """

Starting from these examples, you can use functions of the wildcards in the input or output if necessary.

like image 37
bli Avatar answered Feb 19 '23 06:02

bli