Lets have a simple snakefile like
rule targets:
input:
"plots/dataset1.pdf",
"plots/dataset2.pdf"
rule plot:
input:
"raw/{dataset}.csv"
output:
"plots/{dataset}.pdf"
shell:
"somecommand {input} {output}"
I want to generalize the plot rule so that it can be run inside a docker container, whit somethig like
rule targets:
input:
"plots/dataset1.pdf",
"plots/dataset2.pdf"
rule plot:
input:
"raw/{dataset}.csv"
output:
"plots/{dataset}.pdf"
singularity:
"docker://joseespinosa/docker-r-ggplot2"
shell:
"somecommand {input} {output}"
If I understood well, when I run snakemake --use-singularity
I obtain that somecommand
run inside the docker container, where the input csv files cannot be found without some volume configuration of the container.
Can you please provide a small working example describing how volumes can be configured in the Snakefile or other Snakemake files?
Docker volumes are dependent on Docker's file system and are the preferred method of persisting data for Docker containers and services. When a container is started, Docker loads the read-only image layer, adds a read-write layer on top of the image stack, and mounts volumes onto the container filesystem.
While bind mounts are dependent on the directory structure and OS of the host machine, volumes are completely managed by Docker. Volumes have several advantages over bind mounts: Volumes are easier to back up or migrate than bind mounts. You can manage volumes using Docker CLI commands or the Docker API.
Multiple containers can run with the same volume when they need access to shared data. Docker creates a local volume by default. However, we can use a volume diver to share data across multiple machines. Finally, Docker also has –volumes-from to link volumes between running containers.
When you run snakemake and tell it to use singularity images, you do this:
snakemake --use-singularity
You can also pass additional arguments to singularity, including bind points, like this:
snakemake --use-singularity --singularity-args "-B /path/outside/container/:/path/inside/container/"
Now, if your csv file is in /path/outside/container/
, it can be seen by somecommand without issue.
Bear in mind, if your inside and outside paths are not identical, you'll need to use both paths in your snakemake rule, in different sections. This is how I've done it:
rule targets:
input:
"plots/dataset1.pdf",
"plots/dataset2.pdf"
rule plot:
input:
"raw/{dataset}.csv"
output:
"plots/{dataset}.pdf"
params:
i = "inside/container/input/{dataset}.csv",
o = "inside/container/output/{dataset}.pdf"
singularity:
"docker://joseespinosa/docker-r-ggplot2"
shell:
"somecommand {params.i} {params.o}"
When you run this snakefile, bind raw/
to inside/container/input/
, and bind plots/
to inside/container/output/
. Snakemake will look for the input/output files on your local machine, but will give the container the command to run with the inside-container paths, and everything will be awesome.
TL;DR: Local paths in input and output, container paths in params and shell. Bind local and container paths in the command line invocation.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With