I am using the https://github.com/puckel/docker-airflow image to run Airflow. I had to add pip install docker
in order for it to support DockerOperator.
Everything seems ok, but I can't figure out how to pull an image from a private google docker container repository.
I tried adding the connection in the admin section type of google cloud conenction and running the docker operator as.
t2 = DockerOperator(
task_id='docker_command',
image='eu.gcr.io/project/image',
api_version='2.3',
auto_remove=True,
command="/bin/sleep 30",
docker_url="unix://var/run/docker.sock",
network_mode="bridge",
docker_conn_id="google_con"
)
But always get an error...
[2019-11-05 14:12:51,162] {{taskinstance.py:1047}} ERROR - No Docker registry URL provided
I also tried the docker_conf_option
t2 = DockerOperator(
task_id='docker_command',
image='eu.gcr.io/project/image',
api_version='2.3',
auto_remove=True,
command="/bin/sleep 30",
docker_url="unix://var/run/docker.sock",
network_mode="bridge",
dockercfg_path="/usr/local/airflow/config.json",
)
I get the following error:
[2019-11-06 13:59:40,522] {{docker_operator.py:194}} INFO - Starting docker container from image eu.gcr.io/project/image [2019-11-06 13:59:40,524] {{taskinstance.py:1047}} ERROR - ('Connection aborted.', FileNotFoundError(2, 'No such file or directory'))
I also tried using only dockercfg_path="config.json" and got the same error.
I can't really use Bash Operator to try to docker login as it does not recognize docker command...
What am I missing?
line 1: docker: command not found
t3 = BashOperator(
task_id='print_hello',
bash_command='docker login -u _json_key - p /usr/local/airflow/config.json eu.gcr.io'
)
The Apache Airflow community, releases Docker Images which are reference images for Apache Airflow. Every time a new version of Airflow is released, the images are prepared in the apache/airflow DockerHub for all the supported Python versions.
Push an initial image to the host. For example, these commands pull the busybox image from Docker Hub, tag it with the a path to the gcr.io registry in the project my-project, and then push the image. Container Registry adds the registry to your project, creates a storage bucket for the registry, and stores the image.
To push any local image to Container Registry using Docker or another third-party tool, you need to first tag it with the registry name and then push the image. The following factors might impact uploads for large images: Any request sent to Container Registry has a 2 hour timeout limit.
To push any local image to Container Registry, you need to first tag it with the registry name and then push the image. The very first image that you push to a multi-regional host will create the storage bucket for that hostname in your Google Cloud project.
airflow.hooks.docker_hook.DockerHook
is using docker_default
connection where one isn't configured.
Now in your first attempt, you set google_con
for docker_conn_id
and the error thrown is showing that host (i.e registry name) isn't configured.
Here are a couple of changes to do:
image
argument passed in DockerOperator
should be set to image tag without registry name prefixing it.DockerOperator(api_version='1.21',
# docker_url='tcp://localhost:2375', #Set your docker URL
command='/bin/ls',
image='image',
network_mode='bridge',
task_id='docker_op_tester',
docker_conn_id='google_con',
dag=dag,
# added this to map to host path in MacOS
host_tmp_dir='/tmp',
tmp_dir='/tmp',
)
DockerHook
to authenticate to Docker in your google_con
connection.You can obtain long lived credentials for authentication from a service account key. For username, use _json_key
and in password field paste in the contents of the json key file.
Here are logs from running my task:
[2019-11-16 20:20:46,874] {base_task_runner.py:110} INFO - Job 443: Subtask docker_op_tester [2019-11-16 20:20:46,874] {dagbag.py:88} INFO - Filling up the DagBag from /Users/r7/OSS/airflow/airflow/example_dags/example_docker_operator.py
[2019-11-16 20:20:47,054] {base_task_runner.py:110} INFO - Job 443: Subtask docker_op_tester [2019-11-16 20:20:47,054] {cli.py:592} INFO - Running <TaskInstance: docker_sample.docker_op_tester 2019-11-14T00:00:00+00:00 [running]> on host 1.0.0.127.in-addr.arpa
[2019-11-16 20:20:47,074] {logging_mixin.py:89} INFO - [2019-11-16 20:20:47,074] {local_task_job.py:120} WARNING - Time since last heartbeat(0.01 s) < heartrate(5.0 s), sleeping for 4.989537 s
[2019-11-16 20:20:47,088] {logging_mixin.py:89} INFO - [2019-11-16 20:20:47,088] {base_hook.py:89} INFO - Using connection to: id: google_con. Host: gcr.io/<redacted-project-id>, Port: None, Schema: , Login: _json_key, Password: XXXXXXXX, extra: {}
[2019-11-16 20:20:48,404] {docker_operator.py:209} INFO - Starting docker container from image alpine
[2019-11-16 20:20:52,066] {logging_mixin.py:89} INFO - [2019-11-16 20:20:52,066] {local_task_job.py:99} INFO - Task exited with return code 0
I know the question is about GCR but it's worth noting that other container registries may expect the config in a different format.
For example Gitlab expects you to pass the fully qualified image name to the DAG and only put the Gitlab container registry host name in the connection:
DockerOperator(
task_id='docker_command',
image='registry.gitlab.com/group/project/image:tag',
api_version='auto',
docker_conn_id='gitlab_registry',
)
The set up your gitlab_registry
connection like:
docker://gitlab+deploy-token-1234:[email protected]
Further to @Tamlyn's answer, we can also skip the creation of connection (docker_conn_id
) from airflow
and use it with gitlab
as under
https://gitlab.com/yourgroup/yourproject/-/settings/repository
(create a token here and get details for logging in)docker login registry.gitlab.com
(on the machine to login to docker from the machine to push the image to docker - enter your gitlab credentials when prompted)docker build -t registry.gitlab.com/yourgroup/yourproject . && docker push registry.gitlab.com/yourgroup/yourproject
(builds and pushes to your project repo's container registry)https://gitlab.com/yourgroup/yourproject/-/settings/repository
(you can use the above created token for logging in)docker login registry.gitlab.com
(to login to docker from the machine to pull the image from docker, this skips the need for creating a docker registry connection - enter your gitlab credentials when prompted = this generates ~/.docker/config.json
which is required Reference from docker docs )dag = DAG(
"dag_id",
default_args = default_args,
schedule_interval = "15 1 * * *"
)
docker_trigger = DockerOperator(
task_id = "task_id",
api_version = "auto",
network_mode = "bridge",
image = "registry.gitlab.com/yourgroup/yourproject",
auto_remove = True, # use if required
force_pull = True, # use if required
xcom_all = True, # use if required
# tty = True, # turning this on screws up the log rendering
# command = "", # use if required
environment = { # use if required
"envvar1": "envvar1value",
"envvar2": "envvar2value",
},
dag = dag,
)
this works with Ubuntu 20.04.2 LTS
(tried and tested) with airflow
installed on the instance
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With