I am trying to test my dataflow pipeline on the DataflowRunner. My code always gets stuck at 1 hr 1min and says: The Dataflow appears to be stuck. When digging through the stack trace of the Dataflow stackdriver, I come across the error saying the  Failed to install packages: failed to install workflow: exit status 1. I saw other stack overflow messages saying that this can be caused when pip packages are not compatible. This is causing my worker startup to always fail.
This is my current setup.py. Can someone please help me understand what I am missing. The job id is 2018-02-09_08_22_34-6196858167817670597.
setup.py
from setuptools import setup, find_packages
requires = [
            'numpy==1.14.0',
            'google-cloud-storage==1.7.0',
            'pandas==0.22.0',
            'sqlalchemy-vertica[pyodbc,turbodbc,vertica-python]==0.2.5',
            'sqlalchemy==1.2.2',
            'apache_beam[gcp]==2.2.0',
            'google-cloud-dataflow==2.2.0'
            ]
setup(
    name="dataflow_pipeline_dependencies",
    version="1.0.0",
    description="Beam pipeline for flattening ism data",
    packages=find_packages(),
    install_requires=requires
)
Include 'workflow' package in the setup.py required packages. Error is solved after including it.
from setuptools import setup, find_packages
requires = [
            'numpy==1.14.0',
            'google-cloud-storage==1.7.0',
            'pandas==0.22.0',
            'sqlalchemy-vertica[pyodbc,turbodbc,vertica-python]==0.2.5',
            'sqlalchemy==1.2.2',
            'apache_beam[gcp]==2.2.0',
            'google-cloud-dataflow==2.2.0',
            'workflow' # Include this line
            ]
setup(
    name="dataflow_pipeline_dependencies",
    version="1.0.0",
    description="Beam pipeline for flattening ism data",
    packages=find_packages(),
    install_requires=requires
)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With