Note: Scroll down to the Background section for useful details. Assume the project uses Python-Django and South, in the following illustration.
What's the best way to import the following CSV
"john","doe","savings","personal"
"john","doe","savings","business"
"john","doe","checking","personal"
"john","doe","checking","business"
"jemma","donut","checking","personal"
Into a PostgreSQL database with the related tables Person, Account, and AccountType considering:
So far two approaches have been considered
I implemented the second approach but am struggling with algorithm defects and code complexity. Is there a python ETL API out there that does what I want? Or an approach that doesn't involve reinventing the wheel?
Background
The company I work at is looking to move hundreds of project-specific design spreadsheets hosted in sharepoint into databases. We're near completing a web application that meets the need by allowing an administrator to define/model a database for each project, store spreadsheets in it, and define the browse experience. At this stage of completion transitioning to a commercial tool isn't an option. Think of the web application as a django-admin alternative, though it isn't, with a DB modeling UI, CSV import/export functionality, customizable browse, and modularized code to address project-specific customizations.
The implemented CSV import interface is cumbersome and buggy so i'm trying to get feedback and find alternate approaches.
First, we import the psycopg2 package and establish a connection to a PostgreSQL database using the pyscopg2. connect() method. before importing a CSV file we need to create a table. In the example below, we created a table by executing the “create table” SQL command using the cursor.
Django uses python's built-in csv library to create Dynamic CSV (Comma Separated values) file. We can use this library into our project's view file. Lets see an example, here we have a django project to that we are implementing this feature.
Uploading CSV file: First create HTML form to upload the csv file. Use below code for the same. Important: Do not forget to include enctype="multipart/form-data" in form. Add a URL in URLpatterns.
How about separating the problem into two separate problems?
Create a Person
class which represents a person in the database. This could use Django's ORM, or extend it, or you could do it yourself.
Now you have two issues:
Person
instance from a row in the CSV.Person
instance to the database.Now, instead of just CSV-to-Database, you have CSV-to-Person and Person-to-Database. I think this is conceptually cleaner. When the admins change the schema, that changes the Person-to-Database side. When the admins change the CSV format, they're changing the CSV-to-Database side. Now you can deal with each separately.
Does that help any?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With