Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to install packages in Airflow (docker-compose)?

The question is very similar to the one already available. The only difference is that I ran Airflow in docker

Step by step:

  1. Put docker-compose.yaml to PyCharm project
  2. Put requirements.txt to PyCharm project
  3. Run docker-compose up
  4. Run DAG and receive a ModuleNotFoundError

I want to start Airflow using docker-compose with the dependencies from requirements.txt. These dependencies should be available by PyCharm interpreter and during DAGs execution

Is there a solution that doesn't require rebuilding the image?

like image 472
Makrushin Evgenii Avatar asked Jun 08 '21 12:06

Makrushin Evgenii


3 Answers

Got the answer at airflow GitHub discussions. The only way now to install extra python packages to build your own image. I will try to explain this solution in more details

Step 1. Put Dockerfile, docker-compose.yaml and requirements.txt files to the project directory

Step 2. Paste to Dockefile code below:

FROM apache/airflow:2.1.0
COPY requirements.txt .
RUN pip install -r requirements.txt

Step 3. Paste to docker-compose.yaml code, which you can find in the official documentation. Replace section image: ${AIRFLOW_IMAGE_NAME:-apache/airflow:2.1.0} with build: .:

---
version: '3'
x-airflow-common:
  &airflow-common
  build: .
  # REPLACED # image: ${AIRFLOW_IMAGE_NAME:-apache/airflow:2.1.0}
  environment:
    &airflow-common-env
    AIRFLOW__CORE__EXECUTOR: CeleryExecutor
    AIRFLOW__CORE__SQL_ALCHEMY_CONN: postgresql+psycopg2://airflow:airflow@postgres/airflow
    AIRFLOW__CELERY__RESULT_BACKEND: db+postgresql://airflow:airflow@postgres/airflow
    AIRFLOW__CELERY__BROKER_URL: redis://:@redis:6379/0
    AIRFLOW__CORE__FERNET_KEY: ''
    AIRFLOW__CORE__DAGS_ARE_PAUSED_AT_CREATION: 'true'
    AIRFLOW__CORE__LOAD_EXAMPLES: 'false'
    AIRFLOW__API__AUTH_BACKEND: 'airflow.api.auth.backend.basic_auth'
  volumes:
    - ./dags:/opt/airflow/dags
    - ./logs:/opt/airflow/logs
    - ./plugins:/opt/airflow/plugins
  user: "${AIRFLOW_UID:-50000}:${AIRFLOW_GID:-50000}"
  depends_on:
    redis:
      condition: service_healthy
    postgres:
      condition: service_healthy

# ...

Your project directory at this moment should look like this:

airflow-project
|docker-compose.yaml
|Dockerfile
|requirements.txt

Step 4. Run docker-compose up to start Airflow, docker-compose should build your image automatically from Dockerfile. Run docker-compose build to rebuild the image and update dependencies

like image 147
Makrushin Evgenii Avatar answered Oct 08 '22 20:10

Makrushin Evgenii


Is there a solution that doesn't require rebuilding the image?

Yes there is now: currently (oct-2021 v2.2.0) it's available as an env variable:

_PIP_ADDITIONAL_REQUIREMENTS

It is used in the docker-compose.yml file. That should do the trick without building a complete image as some of the other answers explain (very well actually :-)

See: https://airflow.apache.org/docs/apache-airflow/stable/docker-compose.yaml

Official documentation https://airflow.apache.org/docs/apache-airflow/stable/start/docker.html#environment-variables-supported-by-docker-compose

like image 20
Wxll Avatar answered Oct 08 '22 18:10

Wxll


Another alternative is update your file docker-compose.yml, put the follow lines with all the commands you need

  command: -c "pip3 install apache-airflow-providers-sftp  apache-airflow-providers-ssh --user"

And rebuild the image

docker-compose up airflow-init
docker-compose up
like image 3
Gustavo Marquez Avatar answered Oct 08 '22 18:10

Gustavo Marquez