Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Snakemake + docker example, how to use volumes

Lets have a simple snakefile like

rule targets:
    input:
        "plots/dataset1.pdf",
        "plots/dataset2.pdf"

rule plot:
    input:
        "raw/{dataset}.csv"
    output:
        "plots/{dataset}.pdf"
    shell:
        "somecommand {input} {output}"

I want to generalize the plot rule so that it can be run inside a docker container, whit somethig like

rule targets:
    input:
        "plots/dataset1.pdf",
        "plots/dataset2.pdf"

rule plot:
    input:
        "raw/{dataset}.csv"
    output:
        "plots/{dataset}.pdf"
    singularity:
        "docker://joseespinosa/docker-r-ggplot2"
    shell:
        "somecommand {input} {output}"

If I understood well, when I run snakemake --use-singularity I obtain that somecommand run inside the docker container, where the input csv files cannot be found without some volume configuration of the container.

Can you please provide a small working example describing how volumes can be configured in the Snakefile or other Snakemake files?

like image 659
mox Avatar asked Oct 10 '18 14:10

mox


People also ask

How does a Docker volume work?

Docker volumes are dependent on Docker's file system and are the preferred method of persisting data for Docker containers and services. When a container is started, Docker loads the read-only image layer, adds a read-write layer on top of the image stack, and mounts volumes onto the container filesystem.

What is the reason for using volumes in Docker?

While bind mounts are dependent on the directory structure and OS of the host machine, volumes are completely managed by Docker. Volumes have several advantages over bind mounts: Volumes are easier to back up or migrate than bind mounts. You can manage volumes using Docker CLI commands or the Docker API.

Can two Docker containers use the same volume?

Multiple containers can run with the same volume when they need access to shared data. Docker creates a local volume by default. However, we can use a volume diver to share data across multiple machines. Finally, Docker also has –volumes-from to link volumes between running containers.


1 Answers

When you run snakemake and tell it to use singularity images, you do this:

snakemake --use-singularity

You can also pass additional arguments to singularity, including bind points, like this:

snakemake --use-singularity --singularity-args "-B /path/outside/container/:/path/inside/container/"

Now, if your csv file is in /path/outside/container/, it can be seen by somecommand without issue.

Bear in mind, if your inside and outside paths are not identical, you'll need to use both paths in your snakemake rule, in different sections. This is how I've done it:

rule targets:
    input:
        "plots/dataset1.pdf",
        "plots/dataset2.pdf"

rule plot:
    input:
        "raw/{dataset}.csv"
    output:
        "plots/{dataset}.pdf"
    params:
        i = "inside/container/input/{dataset}.csv",
        o = "inside/container/output/{dataset}.pdf"
    singularity:
        "docker://joseespinosa/docker-r-ggplot2"
    shell:
        "somecommand {params.i} {params.o}"

When you run this snakefile, bind raw/ to inside/container/input/, and bind plots/ to inside/container/output/. Snakemake will look for the input/output files on your local machine, but will give the container the command to run with the inside-container paths, and everything will be awesome.

TL;DR: Local paths in input and output, container paths in params and shell. Bind local and container paths in the command line invocation.

like image 178
Bari Ballew Avatar answered Sep 22 '22 23:09

Bari Ballew