Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ETL using Python

I am working on a data warehouse and looking for an ETL solution that uses Python. I have played with SnapLogic as an ETL, but I was wondering if there were any other solutions out there.

This data warehouse is just getting started. Ihave not brought any data over yet. It will easily be over 100 gigs with the initial subset of data I want to load into it.

like image 784
emilam Avatar asked Sep 21 '10 16:09

emilam


2 Answers

Yes. Just write Python using a DB-API interface to your database.

Most ETL programs provide fancy "high-level languages" or drag-and-drop GUI's that don't help much.

Python is just as expressive and just as easy to work with.

Eschew obfuscation. Just use plain-old Python.

We do it every day and we're very, very pleased with the results. It's simple, clear and effective.

like image 184
S.Lott Avatar answered Oct 20 '22 09:10

S.Lott


You can use pyodbc a library python provides to extract data from various Database Sources. And than use pandas dataframes to manipulate and clean the data as per the organizational needs. And than pyodbc to load it to your data warehouse.

like image 45
Umar Aftab Avatar answered Oct 20 '22 09:10

Umar Aftab