Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

nextflow does not find all my python modules

I am trying to make a Nextflow script that utilizes a python script. My python script imports a number of modules but within Nextflow python3 does not find two (cv2 and matplotlib) of 7 modules and crashes. If I call the script directly from bash it works fine. I would like to avoid creating a docker image to run this script.

Error executing process > 'grab_images (1)'

Caused by:
  Process `grab_images (1)` terminated with an error exit status (1)

Command executed:

  python3 --version
  echo 'processing image-1.npy'
  python3 /home/hq/cv_proj/k_means2.py image-1.npy

Command exit status:
  1

Command output:
  Python 3.7.3
  processing image-1.npy

Command error:
  Traceback (most recent call last):
    File "/home/hq/cv_proj/k_means2.py", line 5, in <module>
      import matplotlib.pyplot as plt 
  ModuleNotFoundError: No module named 'matplotlib'

Work dir:
  /home/hq/cv_proj/work/7f/b787c62ec420b2b5eb490603ef913f

Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`

I think there is a path issue as modules like numpy, sys, re, time are successfully loaded. How can I fix?

Thanks in advance

UPDATE

To assist other who may have problems using python in nextflow scripts... Make sure your shebang is correct. I was using

    #!/usr/bin/python 

instead of

    #!/usr/bin/python3

Since all of my packages were installed with pip3 and I exclusively use python3 you need to have the right shebang.

like image 279
TheCodeNovice Avatar asked May 15 '26 14:05

TheCodeNovice


1 Answers

Best to avoid absolute paths to your script(s) in your process declarations. This section of the docs is worth taking some time to read: https://www.nextflow.io/docs/latest/sharing.html#manage-dependencies, particularly the subsection on how to manage third party scripts:

Any third party script that does not need to be compiled (Bash, Python, Perl, etc) can be included in the pipeline project repository, so that they are distributed with it.

Grant the execute permission to these files and copy them into a folder named bin/ in the root directory of your project repository. Nextflow will automatically add this folder to the PATH environment variable, and the scripts will automatically be accessible in your pipeline without the need to specify an absolute path to invoke them.

Then the problem is how to manage your Python dependencies. You mentioned Docker is not an option. Is Conda also not an option? The config for Conda might look something like:

name: myenv
channels:
  - conda-forge
  - bioconda
  - defaults
dependencies:
  - conda-forge::matplotlib-base=3.4.3
  - conda-forge::numpy=1.21.2
  - conda-forge::opencv=4.5.2

Then if the above is in a file called environment.yml, create the environment with:

conda env create

See also the best practices for using Conda.

like image 131
Steve Avatar answered May 18 '26 21:05

Steve



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!