My end goal is to host a snakemake workflow on a GitHub repo that can be accessed as a snakemake module. I'm testing locally before I host it, but I'm running into an issue. I cannot access the scripts in the snakemake module directory. It looks locally in the current snakemake directory for the scripts, which I obviously cannot move locally if my end goal is to host the module remotely.
I don't see this problem when accessing Conda environments in the remote directory. Is there a way to mimic this behavior for a scripts directory? I would be open to an absolute path reference if it can be applied to access a remote script directory. Here's a dummy example reproducing the error:
Snakemake version: 6.0.5
Tree structure:
.
├── external_module
│ ├── scripts
│ │ ├── argparse
│ │ └── print.py
│ └── Snakefile
└── Snakefile
Local snakefile:
module remote_module:
snakefile: "external_module/Snakefile"
use rule * from remote_module
use rule foo from remote_module with:
input:
"complete.txt"
External Snakefile:
rule foo:
input:
"complete.txt"
rule bar:
output:
touch(temp("complete.txt"))
shell:
"scripts/print.py -i foo"
print.py
import argparse
def get_parser():
parser = argparse.ArgumentParser(
description='dummy snakemake function',
formatter_class=argparse.ArgumentDefaultsHelpFormatter)
parser.add_argument("-i", default=None,
help="item to be printed")
return parser
def main():
args = get_parser().parse_args()
print(args.i)
if __name__ == '__main__':
main()
Snakemake pipeline execution
(base) bobby@SongBird:~/remote_snakemake_test$ snakemake --cores 4
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 4
Rules claiming more threads will be scaled down.
Job counts:
count jobs
1 bar
1 foo
2
[Fri Mar 26 10:12:50 2021]
rule bar:
output: complete.txt
jobid: 1
/usr/bin/bash: scripts/print.py: No such file or directory
[Fri Mar 26 10:12:50 2021]
Error in rule bar:
jobid: 1
output: complete.txt
shell:
scripts/print.py -i foo
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /home/bobby/remote_snakemake_test/.snakemake/log/2021-03-26T101250.118440.snakemake.log
Any insight would be very appreciated. Thanks!
I'm strugglig with modules and associated scripts as well. AFAIK the 'shell' entry does NOT keep track of the external path, whereas the 'script' entry does. So consider an external module with the following rules:
rule foo_shell:
...
shell:
"script/somescript -a somearg ..."
rule foo_script:
...
script:
"scripts/somescript_without_arguments.py"
Rule foo_shell will look for the script somescript in the subdir scripts relative to the main (local) Snakefile, which in your example obviously doesn't exist. Rule foo_script will look for the script somescript_without_arguments.py in the scripts directory in the directory containing the remote Snakefile, ie in your external_module/scripts directory.
Scripts called via the scripts entry cannot be called using arguments, but they instead have access to a variable 'snakemake', see the docs. Also, only a few languages are possible, eg python, R, ...
I made some changes to your example which worked for me:
local/main Snakefile:
module remote_module:
snakefile: "external_module/Snakefile"
config: config
use rule * from remote_module
use rule foo from remote_module with:
input:
"complete.txt"
external_module/Snakefile:
rule foo:
input:
"complete.txt"
rule bar:
output:
touch(temp("complete.txt"))
params:
i="foo2"
script:
"scripts/print2.py"
external_module/scripts/print2.py (ugly, but informative :-) )
print(snakemake.params["i"])
Something that confuses me is that it seems that in the script used in the external module additional, external, python or R scripts can be used by importing (python) or sourcing (R) them in the called script. The following external python script works just fine, assuming script1.py and script2.py are both in the scripts directory in the external module:
# script1.py
import script2
...
But so far I have not been able to execute a bash script from, eg, a running python script. Something like subprocess.run("remote_module_script.sh","arg") again looks for the bash script relative to the directory which contains the main, local snakefile. It seems there is no way to run bash scripts in remote modules, except using methods explained in Troy's answer. As I want to be able to use modules completely external to the current filesystem (eg directly from github) this option doesn't work for me.
I hope I'm wrong wrt to bash scripts and somebody will explain better how external modules and external (bash) scripts actually do work.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With