Could somebody please clarify the expected behavior when using save_main_session
and custom modules imported in __main__
. My DataFlow pipeline imports 2 non-standard modules - one via requirements.txt
and another one via setup_file
. Unless I move the imports into the functions where they get used I keep getting import/pickling errors. Sample error is below. From the documentation, I assumed that setting save_main_session
would help to solve this problem, but it does not (see error below). So I wonder if I missed something or this behavior is by design. The same import works fine when placed into a function.
Error:
File "/usr/lib/python2.7/pickle.py", line 1130, in find_class __import__(module) ImportError: No module named jmespath
Pythonflow is a simple implementation of dataflow programming for python. Users of Tensorflow will immediately be familiar with the syntax. At Spotify, we use Pythonflow in data preprocessing pipelines for machine learning models because.
Set up your environmentDataflow no longer supports pipelines using Python 2. For more information, see Python 2 support on Google Cloud page. Dataflow doesn't support Python 3.10. Use Python version 3.9 or earlier.
https://cloud.google.com/dataflow/faq#how-do-i-handle-nameerrors https://beam.apache.org/documentation/sdks/python-pipeline-dependencies/
When to use --save_main_session
:
you can set the
--save_main_session
pipeline option toTrue
. This will cause the state of the global namespace to be pickled and loaded on the Cloud Dataflow worker
The setup that best works for me is having a dataflow_launcher.py
sitting at the project root with your setup.py
. The only thing it does is import your pipeline file and launch it. Use setup.py
to handle all your dependencies. This is the best example I've found so far.
https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/complete/juliaset
In particular, --save_main_session fails if your DoFn has an __init__
using super. See https://issues.apache.org/jira/browse/BEAM-6158.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With