Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Including another file in Dataflow Python flex template, ImportError

Is there an example of a Python Dataflow Flex Template with more than one file where the script is importing other files included in the same folder?

My project structure is like this:

├── pipeline
│   ├── __init__.py
│   ├── main.py
│   ├── setup.py
│   ├── custom.py

I'm trying to import custom.py inside of main.py for a dataflow flex template.

I receive the following error in the pipeline execution:

ModuleNotFoundError: No module named 'custom'

The pipeline works fine if I include all of the code in a single file and don't make any imports.

Example Dockerfile:

FROM gcr.io/dataflow-templates-base/python3-template-launcher-base

ARG WORKDIR=/dataflow/template/pipeline
RUN mkdir -p ${WORKDIR}
WORKDIR ${WORKDIR}

COPY pipeline /dataflow/template/pipeline

COPY spec/python_command_spec.json /dataflow/template/

ENV DATAFLOW_PYTHON_COMMAND_SPEC /dataflow/template/python_command_spec.json

RUN pip install avro-python3 pyarrow==0.11.1 apache-beam[gcp]==2.24.0

ENV FLEX_TEMPLATE_PYTHON_SETUP_FILE="${WORKDIR}/setup.py"
ENV FLEX_TEMPLATE_PYTHON_PY_FILE="${WORKDIR}/main.py"

Python spec file:

{
    "pyFile":"/dataflow/template/pipeline/main.py"
}
  

I am deploying the template with the following command:

gcloud builds submit --project=${PROJECT} --tag ${TARGET_GCR_IMAGE} .
like image 464
Akshay Apte Avatar asked Nov 18 '20 14:11

Akshay Apte


2 Answers

I actually solved this by passing an additional parameter setup_file to the template execution. Also need to add setup_file parameter to the template metadata

--parameters setup_file="/dataflow/template/pipeline/setup.py"

Apparently the command ENV FLEX_TEMPLATE_PYTHON_SETUP_FILE="${WORKDIR}/setup.py" in the Dockerfile is useless and doesnt actually pick up the setup file.

My setup file looked like this:

import setuptools

setuptools.setup(
    packages=setuptools.find_packages(),
    install_requires=[
        'apache-beam[gcp]==2.24.0'
    ],
 )
like image 84
Akshay Apte Avatar answered Nov 05 '22 18:11

Akshay Apte


After some tests I found out that for some unknown reasons phyton files at working directory (WORKDIR) cannot be referenced with an import. But it works if you create a subfolder and move the python dependencies into it. I tested and it worked, for example, in your use case you can have the following structure:

├── pipeline
│   ├── main.py
│   ├── setup.py
│   ├── mypackage
│   │   ├── __init__.py
│   │   ├── custom.py

And you will be able to reference: import mypackage.custom. The Docker file should move in the custom.py to proper directory.

RUN mkdir -p ${WORKDIR}/mypackage
RUN touch ${WORKDIR}/mypackage/__init__.py
COPY custom.py ${WORKDIR}/mypackage

And the dependecy will be added to the python installation directory:

$ docker exec -it <container> /bin/bash
# find / -name custom.py
/usr/local/lib/python3.7/site-packages/mypackage/custom.py
like image 30
rsantiago Avatar answered Nov 05 '22 18:11

rsantiago