Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert JSON to SQLite in Python - How to map json keys to database columns properly?

I want to convert a JSON file I created to a SQLite database.

My intention is to decide later which data container and entry point is best, json (data entry via text editor) or SQLite (data entry via spreadsheet-like GUIs like SQLiteStudio).

My json file is like this (containing traffic data from some crossroads in my city):

... "2011-12-17 16:00": {     "local": "Av. Protásio Alves; esquina Ramiro Barcelos",     "coord": "-30.036916,-51.208093",     "sentido": "bairro-centro",     "veiculos": "automotores",     "modalidade": "semaforo 50-15",     "regime": "típico",     "pistas": "2+c",     "medicoes": [         [32, 50],         [40, 50],         [29, 50],         [32, 50],         [35, 50]         ]     }, "2011-12-19 08:38": {     "local": "R. Fernandes Vieira; esquina Protásio Alves",     "coord": "-30.035535,-51.211079",     "sentido": "único",     "veiculos": "automotores",     "modalidade": "semáforo 30-70",     "regime": "típico",     "pistas": "3",     "medicoes": [         [23, 30],         [32, 30],         [33, 30],         [32, 30]         ]     } ... 

And I have created nice database with a one-to-many relation with these lines of Python code:

import sqlite3  db = sqlite3.connect("fluxos.sqlite") c = db.cursor()  c.execute('''create table medicoes          (timestamp text primary key,           local text,           coord text,           sentido text,           veiculos text,           modalidade text,           pistas text)''')  c.execute('''create table valores          (id integer primary key,           quantidade integer,           tempo integer,           foreign key (id) references medicoes(timestamp))''') 

BUT the problem is, when I was preparing to insert the rows with actual data with something like c.execute("insert into medicoes values(?,?,?,?,?,?,?)" % keys), I realized that, since the dict loaded from the JSON file has no special order, it does not map properly to the column order of the database.

So, I ask: "which strategy/method should I use to programmatically read the keys from each "block" in the JSON file (in this case, "local", "coord", "sentido", "veiculos", "modalidade", "regime", "pistas" e "medicoes"), create the database with the columns in that same order, and then insert the rows with the proper values"?

I have a fair experience with Python, but am just beginning with SQL, so I would like to have some counseling about good practices, and not necessarily a ready recipe.

like image 944
heltonbiker Avatar asked Jan 10 '12 22:01

heltonbiker


People also ask

What is JSON loads?

loads() json. loads() method can be used to parse a valid JSON string and convert it into a Python Dictionary. It is mainly used for deserializing native string, byte, or byte array which consists of JSON data into Python Dictionary.

Does SQLite return JSON?

It's possible to output query results as a JSON document when using the SQLite command line interface. We can do this with the json output mode. We can also use SQLite functions like json_object() and/or json_array() to return query results as a JSON document.


1 Answers

You have this python code:

c.execute("insert into medicoes values(?,?,?,?,?,?,?)" % keys) 

which I think should be

c.execute("insert into medicoes values (?,?,?,?,?,?,?)", keys) 

since the % operator expects the string to its left to contain formatting codes.

Now all you need to make this work is for keys to be a tuple (or list) containing the values for the new row of the medicoes table in the correct order. Consider the following python code:

import json  traffic = json.load(open('xxx.json'))  columns = ['local', 'coord', 'sentido', 'veiculos', 'modalidade', 'pistas'] for timestamp, data in traffic.iteritems():     keys = (timestamp,) + tuple(data[c] for c in columns)     print str(keys) 

When I run this with your sample data, I get:

(u'2011-12-19 08:38', u'R. Fernandes Vieira; esquina Prot\xe1sio Alves', u'-30.035535,-51.211079', u'\xfanico', u'automotores', u'sem\xe1foro 30-70', u'3') (u'2011-12-17 16:00', u'Av. Prot\xe1sio Alves; esquina Ramiro Barcelos', u'-30.036916,-51.208093', u'bairro-centro', u'automotores', u'semaforo 50-15', u'2+c') 

which would seem to be the tuples you require.

You could add the necessary sqlite code with something like this:

import json import sqlite3  traffic = json.load(open('xxx.json')) db = sqlite3.connect("fluxos.sqlite")  query = "insert into medicoes values (?,?,?,?,?,?,?)" columns = ['local', 'coord', 'sentido', 'veiculos', 'modalidade', 'pistas'] for timestamp, data in traffic.iteritems():     keys = (timestamp,) + tuple(data[c] for c in columns)     c = db.cursor()     c.execute(query, keys)     c.close() 

Edit: if you don't want to hard-code the list of columns, you could do something like this:

import json  traffic = json.load(open('xxx.json'))  someitem = traffic.itervalues().next() columns = list(someitem.keys()) print columns 

When I run this it prints:

[u'medicoes', u'veiculos', u'coord', u'modalidade', u'sentido', u'local', u'pistas', u'regime'] 

You could use it with something like this:

import json import sqlite3  db = sqlite3.connect('fluxos.sqlite') traffic = json.load(open('xxx.json'))  someitem = traffic.itervalues().next() columns = list(someitem.keys()) columns.remove('medicoes') columns.remove('regime')  query = "insert into medicoes (timestamp,{0}) values (?{1})" query = query.format(",".join(columns), ",?" * len(columns)) print query  for timestamp, data in traffic.iteritems():     keys = (timestamp,) + tuple(data[c] for c in columns)     c = db.cursor()     c.execute(query)     c.close() 

The query this code prints when I try it with your sample data is something like this:

insert into medicoes (timestamp,veiculos,coord,modalidade,sentido,local,pistas) values (?,?,?,?,?,?,?) 
like image 103
srgerg Avatar answered Sep 29 '22 13:09

srgerg