Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Airflow pull docker image from private google container repository

I am using the https://github.com/puckel/docker-airflow image to run Airflow. I had to add pip install docker in order for it to support DockerOperator.

Everything seems ok, but I can't figure out how to pull an image from a private google docker container repository.

I tried adding the connection in the admin section type of google cloud conenction and running the docker operator as.

    t2 = DockerOperator(
            task_id='docker_command',
            image='eu.gcr.io/project/image',
            api_version='2.3',
            auto_remove=True,
            command="/bin/sleep 30",
            docker_url="unix://var/run/docker.sock",
            network_mode="bridge",
            docker_conn_id="google_con"
    )

But always get an error...

[2019-11-05 14:12:51,162] {{taskinstance.py:1047}} ERROR - No Docker registry URL provided

I also tried the docker_conf_option

    t2 = DockerOperator(
            task_id='docker_command',
            image='eu.gcr.io/project/image',
            api_version='2.3',
            auto_remove=True,
            command="/bin/sleep 30",
            docker_url="unix://var/run/docker.sock",
            network_mode="bridge",
            dockercfg_path="/usr/local/airflow/config.json",

    )

I get the following error:

[2019-11-06 13:59:40,522] {{docker_operator.py:194}} INFO - Starting docker container from image eu.gcr.io/project/image [2019-11-06 13:59:40,524] {{taskinstance.py:1047}} ERROR - ('Connection aborted.', FileNotFoundError(2, 'No such file or directory'))

I also tried using only dockercfg_path="config.json" and got the same error.

I can't really use Bash Operator to try to docker login as it does not recognize docker command...

What am I missing?

line 1: docker: command not found

t3 = BashOperator(
                task_id='print_hello',
                bash_command='docker login -u _json_key - p /usr/local/airflow/config.json eu.gcr.io'
        )
like image 950
Tomaž Bratanič Avatar asked Nov 06 '19 15:11

Tomaž Bratanič


People also ask

What are the Apache Airflow Docker images?

The Apache Airflow community, releases Docker Images which are reference images for Apache Airflow. Every time a new version of Airflow is released, the images are prepared in the apache/airflow DockerHub for all the supported Python versions.

How do I push an image from Docker Hub to project?

Push an initial image to the host. For example, these commands pull the busybox image from Docker Hub, tag it with the a path to the gcr.io registry in the project my-project, and then push the image. Container Registry adds the registry to your project, creates a storage bucket for the registry, and stores the image.

How to push images to container registry using Docker?

To push any local image to Container Registry using Docker or another third-party tool, you need to first tag it with the registry name and then push the image. The following factors might impact uploads for large images: Any request sent to Container Registry has a 2 hour timeout limit.

How to push images to container registry in Google Cloud?

To push any local image to Container Registry, you need to first tag it with the registry name and then push the image. The very first image that you push to a multi-regional host will create the storage bucket for that hostname in your Google Cloud project.


3 Answers

airflow.hooks.docker_hook.DockerHook is using docker_default connection where one isn't configured.

Now in your first attempt, you set google_con for docker_conn_id and the error thrown is showing that host (i.e registry name) isn't configured.

Here are a couple of changes to do:

  • image argument passed in DockerOperator should be set to image tag without registry name prefixing it.
DockerOperator(api_version='1.21',
    # docker_url='tcp://localhost:2375', #Set your docker URL
    command='/bin/ls',
    image='image',
    network_mode='bridge',
    task_id='docker_op_tester',
    docker_conn_id='google_con',
    dag=dag,
    # added this to map to host path in MacOS
    host_tmp_dir='/tmp', 
    tmp_dir='/tmp',
    )
  • provide registry name, username and password for the underlying DockerHook to authenticate to Docker in your google_con connection.

You can obtain long lived credentials for authentication from a service account key. For username, use _json_key and in password field paste in the contents of the json key file.

Google connection for docker

Here are logs from running my task:

[2019-11-16 20:20:46,874] {base_task_runner.py:110} INFO - Job 443: Subtask docker_op_tester [2019-11-16 20:20:46,874] {dagbag.py:88} INFO - Filling up the DagBag from /Users/r7/OSS/airflow/airflow/example_dags/example_docker_operator.py
[2019-11-16 20:20:47,054] {base_task_runner.py:110} INFO - Job 443: Subtask docker_op_tester [2019-11-16 20:20:47,054] {cli.py:592} INFO - Running <TaskInstance: docker_sample.docker_op_tester 2019-11-14T00:00:00+00:00 [running]> on host 1.0.0.127.in-addr.arpa
[2019-11-16 20:20:47,074] {logging_mixin.py:89} INFO - [2019-11-16 20:20:47,074] {local_task_job.py:120} WARNING - Time since last heartbeat(0.01 s) < heartrate(5.0 s), sleeping for 4.989537 s
[2019-11-16 20:20:47,088] {logging_mixin.py:89} INFO - [2019-11-16 20:20:47,088] {base_hook.py:89} INFO - Using connection to: id: google_con. Host: gcr.io/<redacted-project-id>, Port: None, Schema: , Login: _json_key, Password: XXXXXXXX, extra: {}
[2019-11-16 20:20:48,404] {docker_operator.py:209} INFO - Starting docker container from image alpine
[2019-11-16 20:20:52,066] {logging_mixin.py:89} INFO - [2019-11-16 20:20:52,066] {local_task_job.py:99} INFO - Task exited with return code 0
like image 133
Oluwafemi Sule Avatar answered Sep 23 '22 21:09

Oluwafemi Sule


I know the question is about GCR but it's worth noting that other container registries may expect the config in a different format.

For example Gitlab expects you to pass the fully qualified image name to the DAG and only put the Gitlab container registry host name in the connection:

DockerOperator(
    task_id='docker_command',
    image='registry.gitlab.com/group/project/image:tag',
    api_version='auto',
    docker_conn_id='gitlab_registry',
)

The set up your gitlab_registry connection like:

docker://gitlab+deploy-token-1234:[email protected]
like image 41
Tamlyn Avatar answered Sep 22 '22 21:09

Tamlyn


Further to @Tamlyn's answer, we can also skip the creation of connection (docker_conn_id) from airflow and use it with gitlab as under

  1. On your development machine :
  • https://gitlab.com/yourgroup/yourproject/-/settings/repository (create a token here and get details for logging in)
  • docker login registry.gitlab.com (on the machine to login to docker from the machine to push the image to docker - enter your gitlab credentials when prompted)
  • docker build -t registry.gitlab.com/yourgroup/yourproject . && docker push registry.gitlab.com/yourgroup/yourproject (builds and pushes to your project repo's container registry)
  1. On your airflow machine :
  • https://gitlab.com/yourgroup/yourproject/-/settings/repository (you can use the above created token for logging in)
  • docker login registry.gitlab.com (to login to docker from the machine to pull the image from docker, this skips the need for creating a docker registry connection - enter your gitlab credentials when prompted = this generates ~/.docker/config.json which is required Reference from docker docs )
  1. In your dag :
dag = DAG(
    "dag_id",
    default_args = default_args,
    schedule_interval = "15 1 * * *"
)

docker_trigger = DockerOperator(
    task_id = "task_id",
    api_version = "auto",
    network_mode = "bridge",
    image = "registry.gitlab.com/yourgroup/yourproject",
    auto_remove = True, # use if required
    force_pull = True, # use if required
    xcom_all = True, # use if required
    # tty = True, # turning this on screws up the log rendering
    # command = "", # use if required
    environment = { # use if required
        "envvar1": "envvar1value",
        "envvar2": "envvar2value",
    },
    dag = dag,
)

this works with Ubuntu 20.04.2 LTS (tried and tested) with airflow installed on the instance

like image 29
imsheth Avatar answered Sep 25 '22 21:09

imsheth