Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Broken DAG: (...) No module named docker

I have BigQuery connectors all running, but I have some existing scripts in Docker containers I wish to schedule on Cloud Composer instead of App Engine Flexible.

I have the below script that seems to follow the examples I can find:

import datetime
from airflow import DAG
from airflow import models
from airflow.operators.docker_operator import DockerOperator

yesterday = datetime.datetime.combine(
    datetime.datetime.today() - datetime.timedelta(1),
    datetime.datetime.min.time())

default_args = {
    # Setting start date as yesterday starts the DAG immediately
    'start_date': yesterday,
    # If a task fails, retry it once after waiting at least 5 minutes
    'retries': 1,
    'retry_delay': datetime.timedelta(minutes=5),
}

schedule_interval = '45 09 * * *'

dag = DAG('xxx-merge', default_args=default_args, schedule_interval=schedule_interval)

hfan = DockerOperator(
   task_id = 'hfan',
   image   = 'gcr.io/yyyyy/xxxx'
 )

...but when trying to run it tells me in the web UI:

Broken DAG: [/home/airflow/gcs/dags/xxxx.py] No module named docker

Is it perhaps that the Docker is not configured to work inside the Kubernetes cluster that Cloud Composer runs? Or am I just missing something in the syntax?

like image 747
MarkeD Avatar asked May 09 '18 12:05

MarkeD


2 Answers

This means: whereever your Airflow instance is installed, the Python package named docker is missing.

If I configure my personal machine, I can install missing packages with

pip install docker

EDIT

Within the source code of the docker component https://airflow.incubator.apache.org/_modules/airflow/operators/docker_operator.html

there is an import statement:

from docker import Client, tls

So the new error cannot import name Client seems to me to be connected to a broken install or a wrong version of the docker package.

like image 97
tobi6 Avatar answered Oct 28 '22 07:10

tobi6


As explained in other answers, the Docker Python client is not preinstalled in Cloud Composer environments. To install it, add it as a PyPI dependency in your environment's configuration.

Caveat: by default, DockerOperator will try to talk to the Docker API at /var/run/docker.sock to manage containers. This socket is not mounted inside Composer Airflow worker pods, and manually configuring it to do so is not recommended. Use of DockerOperator is only recommended in Composer if configured to talk to Docker daemons running outside of your environments.

To avoid more brittle configuration or surprises from bypassing Kubernetes (since it is responsible for managing containers across the entire cluster), you should use the KubernetesPodOperator. If you are launching containers into a GKE cluster (or the Composer environment's cluster), then you can use GKEPodOperator, which has more specific GCP-related parameters.

like image 45
hexacyanide Avatar answered Oct 28 '22 07:10

hexacyanide