Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use DBT with AWS Managed Airflow?

hope you are doing well. I wanted to check if anyone has get up and running with dbt in aws mwaa airflow.

I have tried without success this one and this python packages but fails for some reason or another (can't find the dbt path, etc).

Did anyone has managed to use MWAA (Airflow 2) and DBT without having to build a docker image and placing it somewhere?

Thank you!

like image 729
Nicolas Soria Avatar asked Jun 04 '21 13:06

Nicolas Soria


People also ask

How is dbt different from Airflow?

Airflow uses worklows made of directed acyclic graphs (DAGs) of tasks. dbt is a modern data engineering framework maintained by dbt Labs that is becoming very popular in modern data architectures, leveraging cloud data platforms like Snowflake. dbt CLI is the command line interface for running dbt projects.

Can Airflow be used with AWS?

Run Airflow with built-in security You can control role-based authentication and authorization for Apache Airflow's user interface via AWS Identity and Access Management (IAM), providing users Single Sign-ON (SSO) access for scheduling and viewing workflow executions.

How do I integrate DBT with airflow?

We have also seen how to run DBT with the command dbt run. So, one way we can integrate them is simply by creating a DAG that run this command on our OS. Assuming you are connected to the EC2 instance and using the airflow user, create a DAG file: The task within the DAG run the bash command dbt run, using the BashOperator of Airflow.

What is Amazon managed workflows for Apache Airflow?

Amazon Managed Workflows for Apache Airflow (MWAA) is a managed orchestration service for Apache Airflow 1 that makes it easier to set up and operate end-to-end data pipelines in the cloud at scale.

How do I integrate airflow with AWS EC2?

So, one way we can integrate them is simply by creating a DAG that run this command on our OS. Assuming you are connected to the EC2 instance and using the airflow user, create a DAG file: The task within the DAG run the bash command dbt run, using the BashOperator of Airflow. It’s just like we have executed the command on our terminal.

How to run a job in DBT cloud?

But notice that here DBT Cloud come in handy, since you can choose a custom branch and run jobs before you push your code to the master. After you create the environment, you will see the environment page, with no job yet. Click on New Job. Choose a name for the job. In the environment option, choose the only option you will have.


Video Answer


1 Answers

I've managed to solve this by doing the following steps:

  1. Add dbt-core==0.19.1 to your requirements.txt
  2. Add DBT cli executable into plugins.zip
#!/usr/bin/env python3
# EASY-INSTALL-ENTRY-SCRIPT: 'dbt-core==0.19.1','console_scripts','dbt'
__requires__ = 'dbt-core==0.19.1'
import re
import sys
from pkg_resources import load_entry_point

if __name__ == '__main__':
    sys.argv[0] = re.sub(r'(-script\.pyw?|\.exe)?$', '', sys.argv[0])
    sys.exit(
        load_entry_point('dbt-core==0.19.1', 'console_scripts', 'dbt')()
    )

And from here you have two options:

  1. Setting dbt_bin operator argument to /usr/local/airflow/plugins/dbt
  2. Add /usr/local/airflow/plugins/ to the $PATH by following the docs

Environment variable setter example:

from airflow.plugins_manager import AirflowPlugin
import os

os.environ["PATH"] = os.getenv(
    "PATH") + ":/usr/local/airflow/.local/lib/python3.7/site-packages:/usr/local/airflow/plugins/"


class EnvVarPlugin(AirflowPlugin):
    name = 'env_var_plugin'

The plugins zip content:

plugins.zip
├── dbt (DBT cli executable)
└── env_var_plugin.py (environment variable setter)
like image 54
Yonatan Kiron Avatar answered Oct 19 '22 04:10

Yonatan Kiron