Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Importing a CSV file into a PostgreSQL DB using Python-Django

Note: Scroll down to the Background section for useful details. Assume the project uses Python-Django and South, in the following illustration.

What's the best way to import the following CSV

"john","doe","savings","personal"
"john","doe","savings","business"
"john","doe","checking","personal"
"john","doe","checking","business"
"jemma","donut","checking","personal"

Into a PostgreSQL database with the related tables Person, Account, and AccountType considering:

  1. Admin users can change the database model and CSV import-representation in real-time via a custom UI
  2. The saved CSV-to-Database table/field mappings are used when regular users import CSV files

So far two approaches have been considered

  1. ETL-API Approach: Providing an ETL API a spreadsheet, my CSV-to-Database table/field mappings, and connection info to the target database. The API would then load the spreadsheet and populate the target database tables. Looking at pygrametl I don't think what i'm aiming for is possible. In fact, i'm not sure any ETL APIs do this.
  2. Row-level Insert Approach: Parsing the CSV-to-Database table/field mappings, parsing the spreadsheet, and generating SQL inserts in "join-order".

I implemented the second approach but am struggling with algorithm defects and code complexity. Is there a python ETL API out there that does what I want? Or an approach that doesn't involve reinventing the wheel?


Background

The company I work at is looking to move hundreds of project-specific design spreadsheets hosted in sharepoint into databases. We're near completing a web application that meets the need by allowing an administrator to define/model a database for each project, store spreadsheets in it, and define the browse experience. At this stage of completion transitioning to a commercial tool isn't an option. Think of the web application as a django-admin alternative, though it isn't, with a DB modeling UI, CSV import/export functionality, customizable browse, and modularized code to address project-specific customizations.

The implemented CSV import interface is cumbersome and buggy so i'm trying to get feedback and find alternate approaches.

like image 491
Mario Aguilera Avatar asked Mar 18 '13 05:03

Mario Aguilera


People also ask

How do I import a csv file into PostgreSQL using Python?

First, we import the psycopg2 package and establish a connection to a PostgreSQL database using the pyscopg2. connect() method. before importing a CSV file we need to create a table. In the example below, we created a table by executing the “create table” SQL command using the cursor.

Can I use CSV as database in django?

Django uses python's built-in csv library to create Dynamic CSV (Comma Separated values) file. We can use this library into our project's view file. Lets see an example, here we have a django project to that we are implementing this feature.

How import and read csv file in django?

Uploading CSV file: First create HTML form to upload the csv file. Use below code for the same. Important: Do not forget to include enctype="multipart/form-data" in form. Add a URL in URLpatterns.


1 Answers

How about separating the problem into two separate problems?

Create a Person class which represents a person in the database. This could use Django's ORM, or extend it, or you could do it yourself.

Now you have two issues:

  1. Create a Person instance from a row in the CSV.
  2. Save a Person instance to the database.

Now, instead of just CSV-to-Database, you have CSV-to-Person and Person-to-Database. I think this is conceptually cleaner. When the admins change the schema, that changes the Person-to-Database side. When the admins change the CSV format, they're changing the CSV-to-Database side. Now you can deal with each separately.

Does that help any?

like image 56
Claudiu Avatar answered Oct 05 '22 03:10

Claudiu