Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I install a python package onto Google Dataflow and import it into my pipeline?

My folder structure is as follows:

Project/
 --Pipeline.py
 --setup.py
 --dist/
  --ResumeParserDependencies-0.1.tar.gz
 --Dependencies/
        --Module1.py
        --Module2.py
        --Module3.py

My setup.py file looks like this:

from setuptools import setup, find_packages

setup(name='ResumeParserDependencies',
  version='0.1',
  description='Dependencies',
  install_requires=[
   'google-cloud-storage==1.11.0',
   'requests==2.19.1',
   'urllib3==1.23'
    ],
  packages = ['Dependencies']
 )

I used the setup.py file to create a tar.gz file using 'python setup.py sdist'. The tar file is in the dist folder as ResumeParserDependencies-0.1.tar.gz. I then specified

setup_options.extra_packages = ['./dist/ResumeParserDependencies-0.1.tar.gz'] in my pipeline options.

However, once I run my pipeline on Dataflow, I get the error 'No module named ResumeParserDependencies'. If I use 'pip install ResumeParserDependencies-0.1.tar.gz' locally, the package installs, and I can see it using 'pip freeze'.


What am I missing to load the package into Dataflow?

like image 463
Melissa Guo Avatar asked Sep 08 '18 21:09

Melissa Guo


1 Answers

I changed my folder structure and got this to work:

Project/
--Pipeline.py
--setup.py
--Module1/
    --__init__.py
--Module2/
    --__init__.py
--Module3/
    --__init__.py

The setup.py file now looks like this: from setuptools import setup, find_packages

setup(name='ResumeParserDependencies',
  version='0.1',
  description='Dependencies',
  install_requires=[
   'google-cloud-storage==1.11.0',
   'urllib3==1.23'
    ],
  packages = find_packages()
 )

In my pipeline, I specified:

setup_options.setup_file = './setup.py'

And I didn't need:

setup_options.extra_packages = ['./dist/ResumeParserDependencies-0.1.tar.gz']

Reference: find_packages doesn't find my Python file

like image 114
Melissa Guo Avatar answered Oct 24 '22 14:10

Melissa Guo