I'm tearing my hair out here, hopefully someone can help me.
Running snakemake 4.8.0
I have a snakemake pipeline, which I run with two conda envs and --use-conda and it works fine when run as a standalone pipeline.
However, when I run on our cluster, I get the error:
"The 'conda' command is not available in $PATH."
Now. Anaconda is installed on our cluster, but we need to activate it on nodes with:
module load anaconda
Also, module is defined as a function, so I have source a couple of things first. Therefore, at the top of my snakefile, I have:
shell.prefix("source $HOME/.bashrc; source /etc/profile; module load anaconda; )
This doesn't solve the problem.
I even put module load anaconda
in my .bashrc
, and that still doesn't work. Only on cluster execution, I get the error about conda not being found.
Other changes to my .bashrc
are picked up and are picked up by snakemake, so I have no idea why it is having problems with conda.
I even created a conda env, loaded snakemake and conda into that env, activate the env in the submission script and in the Snakefile:
shell.prefix("source $HOME/.bashrc; source /etc/profile; module load anaconda; source activate MAGpy-3.5; ")
And it still says "The 'conda' command is not available in $PATH."
Literally tearing my hair out.
As an aside, I submit using qsub -S /bin/bash
and also use shell.executable("/bin/bash")
but the temp shell scripts created in .snakemake
are run by /bin/sh
- is that expected?
Please help me!
I always have to use:
set +u; {params.env}; set -u
(where {params.env}
is loading up a conda command from my config.yaml)
when invoking a conda environment within the shell
command of a Snakefile, because Snakemake is automatically prepending shell commands with set +u
.
Not sure if this will fix your problem, but worth a spin?
You can provide a custom "jobscript template", have you tried that? The default one looks like this:
#!/bin/sh
# properties = {properties}
{exec_job}
So perhaps yours could look this like:
#!/bin/bash
# properties = {properties}
module add anaconda
{exec_job}
and then you refer to this file with the --jobscript
parameter when you run snakemake.
P.S. if you look in the code the {exec_job}
is filled in with a call to python -m snakemake
without any PATH
setting, which I think contributes to the error you are seeing.
What module
does is generally nothing more than modifying PATH and other environment variables. This is also true for conda environments and source activate
As an example, on our cluster QIIME2 is installed in a conda environment, but its modulefile is
prepend-path PATH /opt/sw/qiime/2.2018.2/bin
prepend-path PYTHONPATH /opt/sw/qiime/2.2018.2/lib/python3.5/site-packages
while our conda
modulefile is
prepend-path PATH /opt/sw/conda/3/bin
So assuming MAGpy-3.5
is your conda environment, you could
(a) make a module for your MAGpy pipeline and load it, ignoring that it is a conda environment or
(b) make snakemake run with a modified PATH (I do not know how snakemake deal with environment variables)
(c) add the path to your conda installation or your MAGpy installation in your .bashrc
Both (b) and (c) defeat the purpose of having a module system IMO, but I've found that anaconda itself is sort of redundant with modulefiles
. In our cluster while we install some software with anaconda, we never make the user load them with source activate
, and write modulefiles
for those instead.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With