I have a folder tree like this in my project
I create an airflow service in a docker container with:
dockerfile
#Base image
FROM puckel/docker-airflow:1.10.1
#Impersonate
USER root
#Los automatically thrown to the I/O strem and not buffered.
ENV PYTHONUNBUFFERED 1
ENV AIRFLOW_HOME=/usr/local/airflow
ENV PYTHONPATH "${PYTHONPATH}:/libraries"
WORKDIR /
#Add docker source files to the docker machine
ADD ./docker_resources ./docker_resources
#Install libraries and dependencies
RUN apt-get update && apt-get install -y vim
RUN pip install --user psycopg2-binary
RUN pip install -r docker_resources/requirements.pip
Docker-compose.yml
version: '3'
services:
postgres:
image: postgres:9.6
container_name: "postgres"
environment:
- POSTGRES_USER=airflow
- POSTGRES_PASSWORD=airflow
- POSTGRES_DB=airflow
ports:
- "5432:5432"
webserver:
build: .
restart: always
depends_on:
- postgres
volumes:
- ./dags:/usr/local/airflow/dags
- ./libraries:/libraries
- ./python_scripts:/python_scripts
ports:
- "8080:8080"
command: webserver
healthcheck:
test: ["CMD-SHELL", "[ -f /usr/local/airflow/airflow-webserver.pid ]"]
interval: 30s
timeout: 30s
retries: 3
scheduler:
build: .
restart: always
depends_on:
- postgres
volumes:
- ./dags:/usr/local/airflow/dags
- ./logs:/usr/local/airflow/logs
ports:
- "8793:8793"
command: scheduler
healthcheck:
test: ["CMD-SHELL", "[ -f /usr/local/airflow/airflow-scheduler.pid ]"]
interval: 30s
timeout: 30s
retries: 3
My dag folder has a tutorial with:
from datetime import timedelta
# The DAG object; we'll need this to instantiate a DAG
from airflow import DAG
# Operators; we need this to operate!
from airflow.operators.bash_operator import BashOperator
from airflow.utils.dates import days_ago
# These args will get passed on to each operator
# You can override them on a per-task basis during operator initialization
default_args = {
'owner': 'airflow',
'depends_on_past': False,
'start_date': days_ago(2),
'email': ['[email protected] '],
'email_on_failure': False,
'email_on_retry': False,
'retries': 0,
'retry_delay': timedelta(minutes=5),
'schedule_interval': '@daily',
}
dag = DAG(
'Tutorial',
default_args=default_args,
description='A simple tutorial DAG with production tables',
catchup=False
)
task_1 = BashOperator(
task_id='my_task',
bash_command='python /python_scripts/my_script.py',
dag=dag,
)
I tried changing bash_command='python /python_scripts/my_script.py',
for:
bash_command='python python_scripts/my_script.py',
bash_command='python ~/../python_scripts/my_script.py',
bash_command='python ~/python_scripts/my_script.py',
And all of them fails. I tried them because BashOperator
run the command in a tmp
folder.
If I get in the machine, and run ls
command I find the file, under python_scripts
.
Even if I run python /python_scripts/my_script.py
from /usr/local/airflow
it works.
The error is always:
INFO - python: can't open file
I searched and people solved the issue with absolute paths, but I can't fix it.
Edit
If in the dockerfile I add ADD ./ ./
below WORKDIR /
and I delete these volumes from docker-compose.yml
:
1. ./libraries:/libraries
2. ./python_scripts:/python_scripts
The error is not file not found, is libraries not found. Import module error
. Which is an improvement, but doesn't make sense cause PYTHONPATH
is defined to have /libraries
folder.
Makes more sense the volumes that the ADD
statement, because I need to have the changes applied into the code instantly into the docker.
Edit 2: Volumes are mounted but no file is inside the container folders, this is why is not able to find the files. When run Add ./ ./ the folder has the files cause there add all the files inside the folder. Despite it doesn't work due libraries are not found neither.
Did you try
bash_command='python /usr/local/airflow/python_scripts/my_script.py'
And you have to check if the folder have the good permissions (access and execute for your user)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With