Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Deployment of Airflow Codebase

We are in the process of streamlining our build/deployment pipelines for airflow codebase.

Does anyone have experience in build and deployment pipelines using CI/CD tools for apache airflow?

How, do you deploy your airflow codebase including DAGs/plugins/operators in different environments like test, staging, production etc.

How do you manage airflow.cfg config for each environments?

Where do you manage your configs for each environment.

like image 423
chandu kavar Avatar asked Aug 31 '18 01:08

chandu kavar


People also ask

How is Airflow deployed?

Airflow sends simple instructions such as “execute task X of dag Y”, but does not send any dag files or configuration. You can use a simple cronjob or any other mechanism to sync DAGs and configs across your nodes, e.g., checkout DAGs from git repo every 5 minutes on all nodes.

How do you deploy Airflow pipeline?

The first step is initializing a local airflow instance with the astro CLI. Initialize your airflow project with astro dev init . Then, test your local environment with astro dev start . Initialize a new Git repository in your project directory ( git init ) and push to GitHub.

What language does Airflow use?

Airflow is written in Python, and workflows are created via Python scripts.

How do you deploy a new DAG in Airflow?

Example for a DAG like this: from airflow import DAG from datetime import datetime, timedelta default_args = { 'owner': 'airflow', 'depends_on_past': False, 'start_date': datetime(2017,12,05,23,59), 'email': ['[email protected]'], 'email_on_failure': True } dag = DAG( 'my_nice_dag-v1.


1 Answers

We build all of our code into a Docker image (DAGs, plugins, different Python packages, different airflow.cfg files, etc.) that gets pushed up to our Kubernetes cluster. That same image runs everywhere, ensuring that dependencies stay locked down and each Airflow is configured best for it's use case (we run multiple Airflow instances on our Kubernetes cluster).

As far as CI/CD, since our deployment is pretty much just a docker push, we've used CircleCI without any issues.

For managing environments, we'll try to keep connections named the same across Airflows (e.g. redshift_conn) but with different credentials (dev Redshift vs prod Redshift). I think there should be more elegant solutions to that, but that's worked so far for us.

like image 149
Viraj Parekh Avatar answered Sep 30 '22 15:09

Viraj Parekh