Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pipeline code spanning multiple files in Apache Beam / Dataflow

After a lengthy search, I haven't found an example of a Dataflow / Beam pipeline that spans several files. Beam docs do suggest a file structure (under the section "Multiple File Dependencies"), but the Juliaset example they give has in effect a single code/source file (and the main file that calls it). Based on the Juliaset example, I need a similar file structure:

juliaset/__init__.py
juliaset/juliaset.py # actual code
juliaset/some_conf.py
__init__.py
juliaset_main.py
setup.py

Now I want to import .some_conf from juliaset/juliaset.py, which works when run locally but gives me an error when run on Dataflow

INFO:root:2017-12-15T17:34:09.333Z: JOB_MESSAGE_ERROR: (8cdf3e226105b90a): Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 706, in run
    self._load_main_session(self.local_staging_directory)
  File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 446, in _load_main_session
    pickler.load_session(session_file)
  File "/usr/local/lib/python2.7/dist-packages/apache_beam/internal/pickler.py", line 247, in load_session
    return dill.load_session(file_path)
  File "/usr/local/lib/python2.7/dist-packages/dill/dill.py", line 363, in load_session
    module = unpickler.load()
  File "/usr/lib/python2.7/pickle.py", line 858, in load
    dispatch[key](self)
  File "/usr/lib/python2.7/pickle.py", line 1133, in load_reduce
    value = func(*args)
  File "/usr/local/lib/python2.7/dist-packages/dill/dill.py", line 767, in _import_module
    return getattr(__import__(module, None, None, [obj]), obj)
ImportError: No module named package_name.juliaset.some_conf

A full working example would be very much appreciated!

like image 269
Mattias Arro Avatar asked Oct 29 '22 21:10

Mattias Arro


1 Answers

Can you verify your setup.py containing a structure like:

import setuptools

setuptools.setup(
    name='My Project',
    version='1.0',
    install_requires=[],
    packages=setuptools.find_packages(),
)

Import your modules like from juliaset.juliaset import SomeClass

And when you call the Python script, use python -m juliaset_main (without the .py)

Not sure if you already tried this, but just to be sure.

like image 141
Martin van Dam Avatar answered Nov 01 '22 15:11

Martin van Dam