Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Populating a SQLite3 database from a .txt file with Python

I am trying to setup a website in django which allows the user to send queries to a database containing information about their representatives in the European Parliament. I have the data in a comma seperated .txt file with the following format:

Parliament, Name, Country, Party_Group, National_Party, Position

7, Marta Andreasen, United Kingdom, Europe of freedom and democracy Group, United Kingdom Independence Party, Member

etc....

I want to populate a SQLite3 database with this data, but so far all the tutorials I have found only show how to do this by hand. Since I have 736 observations in the file I dont really want to do this.

I suspect this is a simple matter, but I would be very grateful if someone could show me how to do this.

Thomas

like image 922
Thomas Jensen Avatar asked Jul 17 '10 09:07

Thomas Jensen


People also ask

How do I populate a SQLite database in Python?

First, connect to the SQLite database by creating a Connection object. Second, create a Cursor object by calling the cursor method of the Connection object. Third, execute an INSERT statement. If you want to pass arguments to the INSERT statement, you use the question mark (?) as the placeholder for each argument.

How do I import a text file into SQLite?

First, from the menu choose tool menu item. Second, choose the database and table that you want to import data then click the Next button. Third, choose CSV as the data source type, choose the CSV file in the Input file field, and choose the ,(comma) option as the Field separator as shown in the picture below.


4 Answers

So assuming your models.py looks something like this:

class Representative(models.Model):
    parliament = models.CharField(max_length=128)
    name = models.CharField(max_length=128)
    country = models.CharField(max_length=128)
    party_group = models.CharField(max_length=128)
    national_party = models.CharField(max_length=128)
    position = models.CharField(max_length=128)

You can then run python manage.py shell and execute the following:

import csv
from your_app.models import Representative
# If you're using different field names, change this list accordingly.
# The order must also match the column order in the CSV file.
fields = ['parliament', 'name', 'country', 'party_group', 'national_party', 'position']
for row in csv.reader(open('your_file.csv')):
    Representative.objects.create(**dict(zip(fields, row)))

And you're done.

Addendum (edit)

Per Thomas's request, here's an explanation of what **dict(zip(fields,row)) does:

So initially, fields contains a list of field names that we defined, and row contains a list of values that represents the current row in the CSV file.

fields = ['parliament', 'name', 'country', ...]
row = ['7', 'Marta Andreasen', 'United Kingdom', ...]

What zip() does is it combines two lists into one list of pairs of items from both lists (like a zipper); i.e. zip(['a','b,'c'], ['A','B','C']) will return [('a','A'), ('b','B'), ('c','C')]. So in our case:

>>> zip(fields, row)
[('parliament', '7'), ('name', 'Marta Andreasen'), ('country', 'United Kingdom'), ...]

The dict() function simply converts the list of pairs into a dictionary.

>>> dict(zip(fields, row))
{'parliament': '7', 'name': 'Marta Andreasen', 'country': 'United Kingdom', ...}

The ** is a way of converting a dictionary into a keyword argument list for a function. So function(**{'key': 'value'}) is the equivalent of function(key='value'). So in out example, calling create(**dict(zip(field, row))) is the equivalent of:

create(parliament='7', name='Marta Andreasen', country='United Kingdom', ...)

Hope this clears things up.

like image 92
Aram Dulyan Avatar answered Oct 27 '22 22:10

Aram Dulyan


As SiggyF says and only slightly differently than Joschua:

Create a text file with your schema, e.g.:

CREATE TABLE politicians (
    Parliament text, 
    Name text, 
    Country text, 
    Party_Group text, 
    National_Party text, 
    Position text
);

Create table:

>>> import csv, sqlite3
>>> conn = sqlite3.connect('my.db')
>>> c = conn.cursor()
>>> with open('myschema.sql') as f:            # read in schema file 
...   schema = f.read()
... 
>>> c.execute(schema)                          # create table per schema 
<sqlite3.Cursor object at 0x1392f50>
>>> conn.commit()                              # commit table creation

Use csv module to read file with data to be inserted:

>>> csv_reader = csv.reader(open('myfile.txt'), skipinitialspace=True)
>>> csv_reader.next()                          # skip the first line in the file
['Parliament', 'Name', 'Country', ...

# put all data in a tuple
# edit: decoding from utf-8 file to unicode
>>> to_db = tuple([i.decode('utf-8') for i in line] for line in csv_reader)
>>> to_db                                      # this will be inserted into table
[(u'7', u'Marta Andreasen', u'United Kingdom', ...

Insert data:

>>> c.executemany("INSERT INTO politicians VALUES (?,?,?,?,?,?);", to_db)
<sqlite3.Cursor object at 0x1392f50>
>>> conn.commit()

Verify that all went as expected:

>>> c.execute('SELECT * FROM politicians').fetchall()
[(u'7', u'Marta Andreasen', u'United Kingdom', ...

Edit:
And since you've decoded (to unicode) on input, you need to be sure to encode on output.
For example:

with open('encoded_output.txt', 'w') as f:
  for row in c.execute('SELECT * FROM politicians').fetchall():
    for col in row:
      f.write(col.encode('utf-8'))
      f.write('\n')
like image 21
mechanical_meat Avatar answered Oct 27 '22 23:10

mechanical_meat


You could read the data using the csv module. Then you can create an insert sql statement and use the method executemany:

  cursor.executemany(sql, rows)

or use add_all if you use sqlalchemy.

like image 45
SiggyF Avatar answered Oct 27 '22 22:10

SiggyF


You asked what the create(**dict(zip(fields, row))) line did.

I don't know how to reply directly to your comment, so I'll try to answer it here.

zip takes multiple lists as args and returns a list of their correspond elements as tuples.

zip(list1, list2) => [(list1[0], list2[0]), (list1[1], list2[1]), .... ]

dict takes a list of 2-element tuples and returns a dictionary mapping each tuple's first element (key) to its second element (value).

create is a function that takes keyword arguments. You can use **some_dictionary to pass that dictionary into a function as keyword arguments.

create(**{'name':'john', 'age':5}) => create(name='john', age=5)

like image 32
Steve Avatar answered Oct 27 '22 22:10

Steve