I am trying to make a Nextflow script that utilizes a python script. My python script imports a number of modules but within Nextflow python3 does not find two (cv2 and matplotlib) of 7 modules and crashes. If I call the script directly from bash it works fine. I would like to avoid creating a docker image to run this script.
Error executing process > 'grab_images (1)'
Caused by:
Process `grab_images (1)` terminated with an error exit status (1)
Command executed:
python3 --version
echo 'processing image-1.npy'
python3 /home/hq/cv_proj/k_means2.py image-1.npy
Command exit status:
1
Command output:
Python 3.7.3
processing image-1.npy
Command error:
Traceback (most recent call last):
File "/home/hq/cv_proj/k_means2.py", line 5, in <module>
import matplotlib.pyplot as plt
ModuleNotFoundError: No module named 'matplotlib'
Work dir:
/home/hq/cv_proj/work/7f/b787c62ec420b2b5eb490603ef913f
Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`
I think there is a path issue as modules like numpy, sys, re, time are successfully loaded. How can I fix?
Thanks in advance
UPDATE
To assist other who may have problems using python in nextflow scripts... Make sure your shebang is correct. I was using
#!/usr/bin/python
instead of
#!/usr/bin/python3
Since all of my packages were installed with pip3 and I exclusively use python3 you need to have the right shebang.
Best to avoid absolute paths to your script(s) in your process declarations. This section of the docs is worth taking some time to read: https://www.nextflow.io/docs/latest/sharing.html#manage-dependencies, particularly the subsection on how to manage third party scripts:
Any third party script that does not need to be compiled (Bash, Python, Perl, etc) can be included in the pipeline project repository, so that they are distributed with it.
Grant the execute permission to these files and copy them into a folder named bin/ in the root directory of your project repository. Nextflow will automatically add this folder to the PATH environment variable, and the scripts will automatically be accessible in your pipeline without the need to specify an absolute path to invoke them.
Then the problem is how to manage your Python dependencies. You mentioned Docker is not an option. Is Conda also not an option? The config for Conda might look something like:
name: myenv
channels:
- conda-forge
- bioconda
- defaults
dependencies:
- conda-forge::matplotlib-base=3.4.3
- conda-forge::numpy=1.21.2
- conda-forge::opencv=4.5.2
Then if the above is in a file called environment.yml, create the environment with:
conda env create
See also the best practices for using Conda.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With