Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Implementing Postgres Sql in Apache Airflow

I have Apache-Airflow implemented on an Ubuntu version 18.04.3 server. When I set it up, I used the sql lite generic database, and this uses the sequential executor. I did this just to play around and get used to the system. Now I'm trying to use the Local Executor, and will need to transition my database from sqlite to the recommended postgres sql.

Does anybody know how to make this transition? All of the tutorials I've found entail setting up Airflow with postgres sql from the beginning. I know there are a ton of moving parts and I'm scared of messsing up what I currently have running. Anybody who knows how to do this or can point me at where to look is much appreciated. Thanks!

like image 991
adamSC Avatar asked Oct 14 '19 16:10

adamSC


People also ask

How do I connect to PostgreSQL database in Airflow?

Configuring the ConnectionSpecify the schema name to be used in the database. Specify the user name to connect. Specify the password to connect. Specify the extra parameters (as json dictionary) that can be used in postgres connection.

Does Airflow need postgres?

If you want to take a real test drive of Airflow, you should consider setting up a database backend to PostgreSQL, MySQL, or MSSQL. By default, Airflow uses SQLite, which is intended for development purposes only. Airflow supports the following database engine versions, so make sure which version you have.

What is postgres Airflow?

Airflow is essentially a graph (Directed Acyclic Graph) made up of tasks (nodes) and dependencies (edges). A task defined or implemented by a operator is a unit of work in your data pipeline. The purpose of Postgres Operator is to define tasks involving interactions with a PostgreSQL database.

What database does Airflow use?

By default, Airflow uses SQLite, which is intended for development purposes only.


1 Answers

Just to complete @lalligood answer with some commands:

In airflow.cfg file look for sql_alchemy_conn and update it to point to your PostgreSQL serv:

sql_alchemy_conn = postgresql+psycopg2://user:pass@hostadress:port/database

For instance:

sql_alchemy_conn = postgresql+psycopg2://airflow:airflow@localhost:5432/airflow

As indicated in the above line you need both user and database called airflow, therefore you need to create that. To do so, open your psql command line and type the following commands to create a user and database called airflow and give all privileges over database airflow to user airflow:

CREATE USER airflow;
CREATE DATABASE airflow;
GRANT ALL PRIVILEGES ON DATABASE airflow TO airflow;

Now you are ready to init the airflow application using postgres:

airflow initdb

If everything was right, access the psql command line again, enter in airflow database with \c airflow command and type \dt command to list all tables of that database. You should see a list of airflow tables, currently it is 23.

like image 117
Marcelo Machado Avatar answered Oct 24 '22 17:10

Marcelo Machado