Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Running docker operator from Google Cloud Composer

As for the documentation, Google Cloud Composer airflow worker nodes are served from a dedicated kubernetes cluster:

enter image description here

I have a Docker contained ETL step that I would like to run using airflow, preferably on the same Kubernetes that is hosting the Workers OR on a dedicated cluster.

What would be the best practice for starting Docker Operation from Cloud Composer airflow environment?

Pragmatic solutions are ❤️

like image 293
Maxim Veksler Avatar asked Jul 05 '18 07:07

Maxim Veksler


People also ask

Does Google Cloud support Docker?

Updating a container on a VM instanceYou can update a Docker image and configuration options to run the container on a VM instance using Google Cloud console or the Google Cloud CLI.

How do you run airflow in GCP?

Airflow runs on port 8080, and in GCP we need to whitelist the IP for this port. Navigate to the VPC Network > Click on Firewall and create a port rule. Add port 8080 under TCP and click Create Rule in the Port rule. On the Compute Instance, add the Firewall rule to access port 8080.


1 Answers

Google Cloud Composer has just recently released into General Availability, and with that you are now able to use a KubernetesPodOperator to launch pods into the same GKE cluster that the managed airflow uses.

Make sure your Composer environment is at least 1.0.0

An example operator:

import datetime

from airflow import models
from airflow.contrib.operators import kubernetes_pod_operator

with models.DAG(
    dag_id='composer_sample_kubernetes_pod',
    schedule_interval=datetime.timedelta(days=1),
    start_date=YESTERDAY) as dag:
# Only name, namespace, image, and task_id are required to create a
# KubernetesPodOperator. In Cloud Composer, currently the operator defaults
# to using the config file found at `/home/airflow/composer_kube_config if
# no `config_file` parameter is specified. By default it will contain the
# credentials for Cloud Composer's Google Kubernetes Engine cluster that is
# created upon environment creation.
kubernetes_min_pod = kubernetes_pod_operator.KubernetesPodOperator(
    # The ID specified for the task.
    task_id='pod-ex-minimum',
    # Name of task you want to run, used to generate Pod ID.
    name='pod-ex-minimum',
    # The namespace to run within Kubernetes, default namespace is
    # `default`. There is the potential for the resource starvation of
    # Airflow workers and scheduler within the Cloud Composer environment,
    # the recommended solution is to increase the amount of nodes in order
    # to satisfy the computing requirements. Alternatively, launching pods
    # into a custom namespace will stop fighting over resources.
    namespace='default',
    # Docker image specified. Defaults to hub.docker.com, but any fully
    # qualified URLs will point to a custom repository. Supports private
    # gcr.io images if the Composer Environment is under the same
    # project-id as the gcr.io images.
    image='gcr.io/gcp-runtimes/ubuntu_16_0_4')

Additional resources:

  • Kubernetes' `KubernetesPodOperator docs
  • MoreKubernetesPodOperator examples.
  • KubernetesPodOperator Airflow code
like image 117
cjmoberg Avatar answered Nov 19 '22 12:11

cjmoberg