Using airflow, I extract data from a MySQL database, transform it with python and load it into a Redshift cluster.
Currently I use 3 airflow tasks : they pass the data by writing CSV on local disk.
How could I do this without writing to disk ? Should I write one big task in python ? ( That would lower visibility )
Edit: this is a question about Airflow, and best practice for choosing the granularity of tasks and how to pass data between them. It is not a general question about data migration or ETL. In this question ETL is only used as an exemple of workload for airflow tasks.
Airflow ETL is one such popular framework that helps in workflow management. It has excellent scheduling capabilities and graph-based execution flow makes it a great alternative for running ETL jobs.
Conclusion. As data warehouse becomes a common solution to fulfill various data-driven companies' needs, deciding the right architecture and tools is an essential thing. BigQuery and Apache Airflow can be the answer for us to ease the ETL procedure in building our Data Warehouse.
There are different ways you can achieve this:
If you are using AWS RDS service for MySQL, you can use AWS Data Pipeline to transfer data from MySQL to Redshift. They have inbuilt template in AWS Data Pipeline to do that. You can even schedule the incremental data transfer from MySQL to Redshift
http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-template-redshift.html
How large is your table? If your table is not too large and you can read the whole table into python using Pandas DataFrame or tuples and then transfer it Redshift. Even if you have large table still you can read that table in chunks and push each chunk to Redshift. Pandas are little inefficient in terms of memory usage if you read table into it.
Creating different tasks in Airflow will not help much. Either you can create a single function and call that function in dag using PythonOperator or create a python script and execute it using BashOperator in dag
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With