hope you are doing well. I wanted to check if anyone has get up and running with dbt in aws mwaa airflow.
I have tried without success this one and this python packages but fails for some reason or another (can't find the dbt path, etc).
Did anyone has managed to use MWAA (Airflow 2) and DBT without having to build a docker image and placing it somewhere?
Thank you!
Airflow uses worklows made of directed acyclic graphs (DAGs) of tasks. dbt is a modern data engineering framework maintained by dbt Labs that is becoming very popular in modern data architectures, leveraging cloud data platforms like Snowflake. dbt CLI is the command line interface for running dbt projects.
Run Airflow with built-in security You can control role-based authentication and authorization for Apache Airflow's user interface via AWS Identity and Access Management (IAM), providing users Single Sign-ON (SSO) access for scheduling and viewing workflow executions.
We have also seen how to run DBT with the command dbt run. So, one way we can integrate them is simply by creating a DAG that run this command on our OS. Assuming you are connected to the EC2 instance and using the airflow user, create a DAG file: The task within the DAG run the bash command dbt run, using the BashOperator of Airflow.
Amazon Managed Workflows for Apache Airflow (MWAA) is a managed orchestration service for Apache Airflow 1 that makes it easier to set up and operate end-to-end data pipelines in the cloud at scale.
So, one way we can integrate them is simply by creating a DAG that run this command on our OS. Assuming you are connected to the EC2 instance and using the airflow user, create a DAG file: The task within the DAG run the bash command dbt run, using the BashOperator of Airflow. It’s just like we have executed the command on our terminal.
But notice that here DBT Cloud come in handy, since you can choose a custom branch and run jobs before you push your code to the master. After you create the environment, you will see the environment page, with no job yet. Click on New Job. Choose a name for the job. In the environment option, choose the only option you will have.
I've managed to solve this by doing the following steps:
dbt-core==0.19.1
to your requirements.txt
#!/usr/bin/env python3
# EASY-INSTALL-ENTRY-SCRIPT: 'dbt-core==0.19.1','console_scripts','dbt'
__requires__ = 'dbt-core==0.19.1'
import re
import sys
from pkg_resources import load_entry_point
if __name__ == '__main__':
sys.argv[0] = re.sub(r'(-script\.pyw?|\.exe)?$', '', sys.argv[0])
sys.exit(
load_entry_point('dbt-core==0.19.1', 'console_scripts', 'dbt')()
)
And from here you have two options:
dbt_bin
operator argument to /usr/local/airflow/plugins/dbt
/usr/local/airflow/plugins/
to the $PATH
by following the docs
Environment variable setter example:
from airflow.plugins_manager import AirflowPlugin
import os
os.environ["PATH"] = os.getenv(
"PATH") + ":/usr/local/airflow/.local/lib/python3.7/site-packages:/usr/local/airflow/plugins/"
class EnvVarPlugin(AirflowPlugin):
name = 'env_var_plugin'
The plugins zip content:
plugins.zip
├── dbt (DBT cli executable)
└── env_var_plugin.py (environment variable setter)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With