I am very new to Docker + Airflow. Below is what I am trying to accomplish.
I have 4 services as shown in the below compose file. 3 are relevant to Airflow and one as a test Ubuntu instance. The Airflow related containers: airflow-database
, airflow-webserver
, airflow-scheduler
are able to communicate with each other and I am able to run the example DAGs.
Now I added a 4th service (ubuntu) to which I am trying to send a simple command "/bin/sleep 10" from a DAG using DockerOperator (Below is the DAG file). But for some reason I am getting the Permission Denied message (Attached the DAG error file as well).
It works if I run Airflow from localhost instead of from inside a docker container Unable to figure out what I am missing. Below are some of the ways I tried:
replaced unix://var/run/docker.sock
with tcp://172.20.0.1
thinking it would be able to resolve via the docker host ip
used gateway.host.internal
Even removed docker_url option from the operator, but realized it anyways gets defaulted to unix://var/run/docker.sock
Tried bunch of combinations, tcp://172.20.0.1:2376, tcp://172.20.0.1:2375
Mapped port of host to Ubuntu, like 8085:8085, etc.
docker version
command. Not sure if that is what it supposed to be.Thanks in advance for any help in what else I can try to make this work :)
docker-compose.yml
version: '3.2'
services:
# Ubuntu Container
ubuntu:
image: ubuntu
networks:
- mynetwork
# Airflow Database
airflow-database:
image: postgres:12
env_file:
- .env
ports:
- 5432:5432
volumes:
- /var/run/docker.sock:/var/run/docker.sock
- ./airflow/database/data:/var/lib/postgresql/data/pgdata
- ./airflow/database/logs:/var/lib/postgresql/data/log
command: >
postgres
-c listen_addresses=*
-c logging_collector=on
-c log_destination=stderr
-c max_connections=200
networks:
- mynetwork
# Airflow DB Init
initdb:
image: apache/airflow:2.0.0-python3.8
env_file:
- .env
depends_on:
- airflow-database
volumes:
- /var/run/docker.sock:/var/run/docker.sock
- ./airflow/metadata-airflow/dags:/opt/airflow/dags
- ./airflow/logs:/opt/airflow/logs
entrypoint: /bin/bash
command: -c "airflow db init && airflow users create --firstname admin --lastname admin --email [email protected] --password admin --username admin --role Admin"
networks:
- mynetwork
# Airflow Webserver
airflow-webserver:
image: apache/airflow:2.0.0-python3.8
env_file:
- .env
depends_on:
- airflow-database
volumes:
- /var/run/docker.sock:/var/run/docker.sock
- ./airflow/metadata-airflow/dags:/opt/airflow/dags
- ./airflow/logs:/opt/airflow/logs
ports:
- 8080:8080
deploy:
restart_policy:
condition: on-failure
delay: 8s
max_attempts: 3
command: webserver
healthcheck:
test: ["CMD-SHELL", "[ -f /opt/airflow/airflow-webserver.pid ]"]
interval: 30s
timeout: 30s
retries: 3
networks:
- mynetwork
# Airflow Scheduler
airflow-scheduler:
image: apache/airflow:2.0.0-python3.8
env_file:
- .env
depends_on:
- airflow-database
- airflow-webserver
volumes:
- /var/run/docker.sock:/var/run/docker.sock
- ./airflow/metadata-airflow/dags:/opt/airflow/dags
- ./airflow/logs:/opt/airflow/logs
deploy:
restart_policy:
condition: on-failure
delay: 8s
max_attempts: 3
command: scheduler
networks:
- mynetwork
networks:
mynetwork:
DAG File
from datetime import timedelta
from airflow import DAG
from airflow.operators.bash_operator import BashOperator
from airflow.providers.docker.operators.docker import DockerOperator
from airflow.utils.dates import days_ago
default_args = {
'owner': 'airflow',
'depends_on_past': False,
'email': ['[email protected]'],
'email_on_failure': False,
'email_on_retry': False,
}
dag = DAG(
'docker_sample',
default_args=default_args,
schedule_interval=None,
start_date=days_ago(2),
)
t1 = DockerOperator(
task_id='docker_op_tester',
api_version='auto',
image='ubuntu',
docker_url='unix://var/run/docker.sock',
auto_remove=True,
command=[
"/bin/bash",
"-c",
"/bin/sleep 30; "],
network_mode='bridge',
dag=dag,
)
t1
DAG Error Log
*** Reading local file: /opt/airflow/logs/docker_sample/docker_op_tester/2021-01-09T05:16:17.174981+00:00/1.log
[2021-01-09 05:16:26,726] {taskinstance.py:826} INFO - Dependencies all met for <TaskInstance: docker_sample.docker_op_tester 2021-01-09T05:16:17.174981+00:00 [queued]>
[2021-01-09 05:16:26,774] {taskinstance.py:826} INFO - Dependencies all met for <TaskInstance: docker_sample.docker_op_tester 2021-01-09T05:16:17.174981+00:00 [queued]>
[2021-01-09 05:16:26,775] {taskinstance.py:1017} INFO -
--------------------------------------------------------------------------------
[2021-01-09 05:16:26,776] {taskinstance.py:1018} INFO - Starting attempt 1 of 1
[2021-01-09 05:16:26,776] {taskinstance.py:1019} INFO -
--------------------------------------------------------------------------------
[2021-01-09 05:16:26,790] {taskinstance.py:1038} INFO - Executing <Task(DockerOperator): docker_op_tester> on 2021-01-09T05:16:17.174981+00:00
[2021-01-09 05:16:26,794] {standard_task_runner.py:51} INFO - Started process 1057 to run task
[2021-01-09 05:16:26,817] {standard_task_runner.py:75} INFO - Running: ['airflow', 'tasks', 'run', 'docker_sample', 'docker_op_tester', '2021-01-09T05:16:17.174981+00:00', '--job-id', '360', '--pool', 'default_pool', '--raw', '--subdir', 'DAGS_FOLDER/example_docker.py', '--cfg-path', '/tmp/tmp4phq52dv']
[2021-01-09 05:16:26,821] {standard_task_runner.py:76} INFO - Job 360: Subtask docker_op_tester
[2021-01-09 05:16:26,932] {logging_mixin.py:103} INFO - Running <TaskInstance: docker_sample.docker_op_tester 2021-01-09T05:16:17.174981+00:00 [running]> on host 367f0fc7d092
[2021-01-09 05:16:27,036] {taskinstance.py:1230} INFO - Exporting the following env vars:
[email protected]
AIRFLOW_CTX_DAG_OWNER=airflow
AIRFLOW_CTX_DAG_ID=docker_sample
AIRFLOW_CTX_TASK_ID=docker_op_tester
AIRFLOW_CTX_EXECUTION_DATE=2021-01-09T05:16:17.174981+00:00
AIRFLOW_CTX_DAG_RUN_ID=manual__2021-01-09T05:16:17.174981+00:00
[2021-01-09 05:16:27,054] {taskinstance.py:1396} ERROR - ('Connection aborted.', PermissionError(13, 'Permission denied'))
Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 670, in urlopen
httplib_response = self._make_request(
File "/home/airflow/.local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 392, in _make_request
conn.request(method, url, **httplib_request_kw)
File "/usr/local/lib/python3.8/http/client.py", line 1255, in request
self._send_request(method, url, body, headers, encode_chunked)
File "/usr/local/lib/python3.8/http/client.py", line 1301, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "/usr/local/lib/python3.8/http/client.py", line 1250, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/usr/local/lib/python3.8/http/client.py", line 1010, in _send_output
self.send(msg)
File "/usr/local/lib/python3.8/http/client.py", line 950, in send
self.connect()
File "/home/airflow/.local/lib/python3.8/site-packages/docker/transport/unixconn.py", line 43, in connect
sock.connect(self.unix_socket)
PermissionError: [Errno 13] Permission denied
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.8/site-packages/requests/adapters.py", line 439, in send
resp = conn.urlopen(
File "/home/airflow/.local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 726, in urlopen
retries = retries.increment(
File "/home/airflow/.local/lib/python3.8/site-packages/urllib3/util/retry.py", line 410, in increment
raise six.reraise(type(error), error, _stacktrace)
File "/home/airflow/.local/lib/python3.8/site-packages/urllib3/packages/six.py", line 734, in reraise
raise value.with_traceback(tb)
File "/home/airflow/.local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 670, in urlopen
httplib_response = self._make_request(
File "/home/airflow/.local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 392, in _make_request
conn.request(method, url, **httplib_request_kw)
File "/usr/local/lib/python3.8/http/client.py", line 1255, in request
self._send_request(method, url, body, headers, encode_chunked)
File "/usr/local/lib/python3.8/http/client.py", line 1301, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "/usr/local/lib/python3.8/http/client.py", line 1250, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/usr/local/lib/python3.8/http/client.py", line 1010, in _send_output
self.send(msg)
File "/usr/local/lib/python3.8/http/client.py", line 950, in send
self.connect()
File "/home/airflow/.local/lib/python3.8/site-packages/docker/transport/unixconn.py", line 43, in connect
sock.connect(self.unix_socket)
urllib3.exceptions.ProtocolError: ('Connection aborted.', PermissionError(13, 'Permission denied'))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1086, in _run_raw_task
self._prepare_and_execute_task_with_callbacks(context, task)
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1260, in _prepare_and_execute_task_with_callbacks
result = self._execute_task(context, task_copy)
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1300, in _execute_task
result = task_copy.execute(context=context)
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/docker/operators/docker.py", line 286, in execute
if self.force_pull or not self.cli.images(name=self.image):
File "/home/airflow/.local/lib/python3.8/site-packages/docker/api/image.py", line 89, in images
res = self._result(self._get(self._url("/images/json"), params=params),
File "/home/airflow/.local/lib/python3.8/site-packages/docker/utils/decorators.py", line 46, in inner
return f(self, *args, **kwargs)
File "/home/airflow/.local/lib/python3.8/site-packages/docker/api/client.py", line 230, in _get
return self.get(url, **self._set_request_timeout(kwargs))
File "/home/airflow/.local/lib/python3.8/site-packages/requests/sessions.py", line 543, in get
return self.request('GET', url, **kwargs)
File "/home/airflow/.local/lib/python3.8/site-packages/requests/sessions.py", line 530, in request
resp = self.send(prep, **send_kwargs)
File "/home/airflow/.local/lib/python3.8/site-packages/requests/sessions.py", line 643, in send
r = adapter.send(request, **kwargs)
File "/home/airflow/.local/lib/python3.8/site-packages/requests/adapters.py", line 498, in send
raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', PermissionError(13, 'Permission denied'))
[2021-01-09 05:16:27,073] {taskinstance.py:1433} INFO - Marking task as FAILED. dag_id=docker_sample, task_id=docker_op_tester, execution_date=20210109T051617, start_date=20210109T051626, end_date=20210109T051627
[2021-01-09 05:16:27,136] {local_task_job.py:118} INFO - Task exited with return code 1
Specs: Docker: Version: 20.10.2 API version: 1.41
Airflow image: apache/airflow:2.0.0-python3.8
Host system: MacOS BigSur
In order for running Airflow in Docker, you need to download Docker and Docker compose then start your container after that you can create your own DAG and schedule the tasks or trigger it. Now you can create your own DAGs and run them in Docker.
As a first step, you obviously need to have Docker installed and have a Docker Hub account. Once you do that, go to Docker Hub and search “Airflow” in the list of repositories, which produces a bunch of results.
I think I got it - source: https://tomgregory.com/running-docker-in-docker-on-windows
Check the Group ID of root:
docker run --rm -v /var/run/docker.sock:/var/run/docker.sock debian:buster-slim stat -c %g /var/run/docker.sock
returns "1001" for me.
Add the group_add statements referring to this group id into your docker-compose.yml:
image: apache/airflow:2.0.0-python3.8
group_add:
- 1001
I added it to webserver and scheduler (not sure if both need it) and it seems to work for me now (at least it crashes at a later point ;-)
Edit:
you also need to add
AIRFLOW__CORE__ENABLE_XCOM_PICKLING=True
as environment variable in Airflow, otherwise your container crashes when exiting (https://github.com/apache/airflow/issues/13487).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With