Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Airflow running python files fails due to python: can't open file

I have a folder tree like this in my project

  • project
    • dags
    • python_scripts
    • libraries
    • docker-compose.yml
    • Dockerfile
    • docker_resources

I create an airflow service in a docker container with:

dockerfile

#Base image
FROM puckel/docker-airflow:1.10.1

#Impersonate
USER root

#Los automatically thrown to the I/O strem and not buffered.
ENV PYTHONUNBUFFERED 1

ENV AIRFLOW_HOME=/usr/local/airflow
ENV PYTHONPATH "${PYTHONPATH}:/libraries"

WORKDIR /
#Add docker source files to the docker machine
ADD ./docker_resources ./docker_resources
#Install libraries and dependencies
RUN apt-get update && apt-get install -y vim
RUN pip install --user psycopg2-binary
RUN pip install -r docker_resources/requirements.pip


Docker-compose.yml
version: '3'
services:
  postgres:
    image: postgres:9.6
    container_name: "postgres"
    environment:
      - POSTGRES_USER=airflow
      - POSTGRES_PASSWORD=airflow
      - POSTGRES_DB=airflow
    ports:
      - "5432:5432"
  webserver:
    build: .
    restart: always
    depends_on:
      - postgres
    volumes:
      - ./dags:/usr/local/airflow/dags
      - ./libraries:/libraries
      - ./python_scripts:/python_scripts
    ports:
      - "8080:8080"
    command: webserver
    healthcheck:
      test: ["CMD-SHELL", "[ -f /usr/local/airflow/airflow-webserver.pid ]"]
      interval: 30s
      timeout: 30s
      retries: 3
  scheduler:
    build: .
    restart: always
    depends_on:
      - postgres
    volumes:
      - ./dags:/usr/local/airflow/dags
      - ./logs:/usr/local/airflow/logs
    ports:
      - "8793:8793"
    command: scheduler
    healthcheck:
      test: ["CMD-SHELL", "[ -f /usr/local/airflow/airflow-scheduler.pid ]"]
      interval: 30s
      timeout: 30s
      retries: 3

My dag folder has a tutorial with:

from datetime import timedelta
# The DAG object; we'll need this to instantiate a DAG
from airflow import DAG
# Operators; we need this to operate!
from airflow.operators.bash_operator import BashOperator
from airflow.utils.dates import days_ago
# These args will get passed on to each operator
# You can override them on a per-task basis during operator initialization
default_args = {
    'owner': 'airflow',
    'depends_on_past': False,
    'start_date': days_ago(2),
    'email': ['[email protected] '],
    'email_on_failure': False,
    'email_on_retry': False,
    'retries': 0,
    'retry_delay': timedelta(minutes=5),
    'schedule_interval': '@daily',
}

dag = DAG(
    'Tutorial',
    default_args=default_args,
    description='A simple tutorial DAG with production tables',
    catchup=False
)

task_1 = BashOperator(
    task_id='my_task',
    bash_command='python /python_scripts/my_script.py',
    dag=dag,
)

I tried changing bash_command='python /python_scripts/my_script.py', for:

  • bash_command='python python_scripts/my_script.py',
  • bash_command='python ~/../python_scripts/my_script.py',
  • bash_command='python ~/python_scripts/my_script.py',

And all of them fails. I tried them because BashOperator run the command in a tmp folder. If I get in the machine, and run ls command I find the file, under python_scripts. Even if I run python /python_scripts/my_script.py from /usr/local/airflowit works.

The error is always:

INFO - python: can't open file

I searched and people solved the issue with absolute paths, but I can't fix it.

Edit If in the dockerfile I add ADD ./ ./ below WORKDIR / and I delete these volumes from docker-compose.yml:

 1. ./libraries:/libraries

 2. ./python_scripts:/python_scripts

The error is not file not found, is libraries not found. Import module error. Which is an improvement, but doesn't make sense cause PYTHONPATH is defined to have /libraries folder.

Makes more sense the volumes that the ADD statement, because I need to have the changes applied into the code instantly into the docker.

Edit 2: Volumes are mounted but no file is inside the container folders, this is why is not able to find the files. When run Add ./ ./ the folder has the files cause there add all the files inside the folder. Despite it doesn't work due libraries are not found neither.

like image 368
mrc Avatar asked Mar 17 '20 09:03

mrc


1 Answers

Did you try

bash_command='python /usr/local/airflow/python_scripts/my_script.py' 

And you have to check if the folder have the good permissions (access and execute for your user)

like image 123
Geoffrey Pruvost Avatar answered Oct 02 '22 13:10

Geoffrey Pruvost