I have a .py pipeline using apache beam that import another module (.py), that is my custom module. I have a strucutre like this:
├── mymain.py
└── myothermodule.py
I import myothermodule.py in mymain.py like this:
import myothermodule
When I run locally on DirectRuner
, I have no problem.
But when I run it on dataflow with DataflowRunner
, I have an error that tells:
ImportError: No module named myothermodule
So I want to know what should I do if I whant this module to be found when running the job on dataflow?
A PCollection can contain either a bounded or unbounded number of elements. Bounded and unbounded PCollections are produced as the output of PTransforms (including root PTransforms like Read and Create ), and can be passed as the inputs of other PTransforms.
PCollection. A PCollection represents a potentially distributed, multi-element dataset that acts as the pipeline's data. Apache Beam transforms use PCollection objects as inputs and outputs for each step in your pipeline.
A PTransform<InputT, OutputT> is an operation that takes an InputT (some subtype of PInput ) and produces an OutputT (some subtype of POutput ). Common PTransforms include root PTransforms like TextIO.
DoFn is a Beam SDK class that describes a distributed processing function.
When you run your pipeline remotely, you need to make any dependencies available on the remote workers too.
To do it you should put your module file in a Python package by putting it in a directory with a __init__.py
file and creating a setup.py. It would look like this:
├── mymain.py
├── setup.py
└── othermodules
├── __init__.py
└── myothermodule.py
And import it like this:
from othermodules import myothermodule
Then you can run you pipeline with the command line option --setup_file ./setup.py
A minimal setup.py file would look like this:
import setuptools
setuptools.setup(packages=setuptools.find_packages())
The whole setup is documented here.
And a whole example using this can be found here.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With