My folder structure is as follows:
Project/
--Pipeline.py
--setup.py
--dist/
--ResumeParserDependencies-0.1.tar.gz
--Dependencies/
--Module1.py
--Module2.py
--Module3.py
My setup.py
file looks like this:
from setuptools import setup, find_packages
setup(name='ResumeParserDependencies',
version='0.1',
description='Dependencies',
install_requires=[
'google-cloud-storage==1.11.0',
'requests==2.19.1',
'urllib3==1.23'
],
packages = ['Dependencies']
)
I used the setup.py file to create a tar.gz file using 'python setup.py sdist'. The tar file is in the dist folder as ResumeParserDependencies-0.1.tar.gz. I then specified
setup_options.extra_packages = ['./dist/ResumeParserDependencies-0.1.tar.gz'] in my pipeline options.
However, once I run my pipeline on Dataflow, I get the error 'No module named ResumeParserDependencies'. If I use 'pip install ResumeParserDependencies-0.1.tar.gz' locally, the package installs, and I can see it using 'pip freeze'.
What am I missing to load the package into Dataflow?
I changed my folder structure and got this to work:
Project/
--Pipeline.py
--setup.py
--Module1/
--__init__.py
--Module2/
--__init__.py
--Module3/
--__init__.py
The setup.py file now looks like this: from setuptools import setup, find_packages
setup(name='ResumeParserDependencies',
version='0.1',
description='Dependencies',
install_requires=[
'google-cloud-storage==1.11.0',
'urllib3==1.23'
],
packages = find_packages()
)
In my pipeline, I specified:
setup_options.setup_file = './setup.py'
And I didn't need:
setup_options.extra_packages = ['./dist/ResumeParserDependencies-0.1.tar.gz']
Reference: find_packages doesn't find my Python file
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With