Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Snakemake: I keep getting The 'conda' command is not available in $PATH. when running on SGE cluster

I'm tearing my hair out here, hopefully someone can help me.

Running snakemake 4.8.0

I have a snakemake pipeline, which I run with two conda envs and --use-conda and it works fine when run as a standalone pipeline.

However, when I run on our cluster, I get the error:

"The 'conda' command is not available in $PATH."

Now. Anaconda is installed on our cluster, but we need to activate it on nodes with:

module load anaconda

Also, module is defined as a function, so I have source a couple of things first. Therefore, at the top of my snakefile, I have:

shell.prefix("source $HOME/.bashrc; source /etc/profile; module load anaconda; )

This doesn't solve the problem.

I even put module load anaconda in my .bashrc, and that still doesn't work. Only on cluster execution, I get the error about conda not being found.

Other changes to my .bashrc are picked up and are picked up by snakemake, so I have no idea why it is having problems with conda.

I even created a conda env, loaded snakemake and conda into that env, activate the env in the submission script and in the Snakefile:

shell.prefix("source $HOME/.bashrc; source /etc/profile; module load anaconda; source activate MAGpy-3.5; ")

And it still says "The 'conda' command is not available in $PATH."

Literally tearing my hair out.

As an aside, I submit using qsub -S /bin/bash and also use shell.executable("/bin/bash") but the temp shell scripts created in .snakemake are run by /bin/sh - is that expected?

Please help me!

like image 287
Mick Watson Avatar asked Apr 05 '18 20:04

Mick Watson


3 Answers

I always have to use:

set +u; {params.env}; set -u

(where {params.env} is loading up a conda command from my config.yaml)

when invoking a conda environment within the shell command of a Snakefile, because Snakemake is automatically prepending shell commands with set +u.

Not sure if this will fix your problem, but worth a spin?

like image 144
Jon Avatar answered Nov 02 '22 04:11

Jon


You can provide a custom "jobscript template", have you tried that? The default one looks like this:

#!/bin/sh
# properties = {properties}
{exec_job}

So perhaps yours could look this like:

#!/bin/bash
# properties = {properties}
module add anaconda
{exec_job}

and then you refer to this file with the --jobscript parameter when you run snakemake.

P.S. if you look in the code the {exec_job} is filled in with a call to python -m snakemake without any PATH setting, which I think contributes to the error you are seeing.

like image 28
Peter van Heusden Avatar answered Nov 02 '22 05:11

Peter van Heusden


What module does is generally nothing more than modifying PATH and other environment variables. This is also true for conda environments and source activate

As an example, on our cluster QIIME2 is installed in a conda environment, but its modulefile is

prepend-path    PATH            /opt/sw/qiime/2.2018.2/bin
prepend-path    PYTHONPATH      /opt/sw/qiime/2.2018.2/lib/python3.5/site-packages

while our conda modulefile is

prepend-path    PATH            /opt/sw/conda/3/bin

So assuming MAGpy-3.5 is your conda environment, you could

(a) make a module for your MAGpy pipeline and load it, ignoring that it is a conda environment or

(b) make snakemake run with a modified PATH (I do not know how snakemake deal with environment variables)

(c) add the path to your conda installation or your MAGpy installation in your .bashrc

Both (b) and (c) defeat the purpose of having a module system IMO, but I've found that anaconda itself is sort of redundant with modulefiles. In our cluster while we install some software with anaconda, we never make the user load them with source activate, and write modulefiles for those instead.

like image 24
H. Gourlé Avatar answered Nov 02 '22 04:11

H. Gourlé