I am playing a bit with the python api for sqlite3, i have a little table for store languages with an id, name and creation_date fields. I am trying to map the raw query results into a namedtuple
as the docs recommend, it that way i can manage rows in a more readable way, so here is my namedtuple
.
LanguageRecord = namedtuple('LanguageRecord', 'id, name, creation_date')
The code that the docs suggest for the mapping is as follows:
for language in map(LanguageRecord._make, c.fetchall()):
# do something with languages
This is fine when i want to return a collection of languages but in this case i want just to retrieve one language:
c.execute('SELECT * FROM language WHERE name=?', (name,))
So my first attempt it was something like this:
language = map(LanguageRecord._make, c.fetchone())
This code doesn't works because fetchone()
returns a tuple instead a list with one tuple,
so the map
function tries to create three namedtuples
one for each tuple field thought.
My first approach to solve this was to explicitly create a list and append to it the tuple result, something like:
languages = []
languages.append(c.fetchone())
for language in map(LanguageRecord._make, languages):
# do something with language
My second approach was to use fetchall()
although i just want one record. I can set
the name field with a unique
constrain in the database in order to garantize just one result.
for language in map(LanguageRecord._make, c.fetchall()):
# do something with languages
Another approach could be use fetchall()[0]
without the unique
constrain to garantize just one result.
My question is which is the best and common way to deal with this problem, should i use always fetchall
to maintain a common interface and let the database manage the uniqueness logic? or should i create a list explicitly as in approach 1? Is there a more easy way to accomplish this task?
There is a much easier way! Sqlite3 provides a way for the user to define "row factories". These row factories take the cursor and the tuple row and can return whatever type of object it wants.
Once you set the row factory with
con.row_factory = my_row_factory
then rows returned by the cursor will be the result of my_row_factory
applied to the tuple-row. For example,
import sqlite3
import collections
LanguageRecord = collections.namedtuple('LanguageRecord', 'id name creation_date')
def namedtuple_factory(cursor, row):
return LanguageRecord(*row)
con = sqlite3.connect(":memory:")
con.row_factory = namedtuple_factory
cur = con.cursor()
cur.execute("select 1,2,3")
print(cur.fetchone())
yields
LanguageRecord(id=1, name=2, creation_date=3)
For another example of how to define a namedtuple factory, see this post.
By the way, if you set
conn.row_factory = sqlite3.Row
then rows are returned as dicts, whose keys are the table's column names. Thus, instead of accessing parts of the namedtuple with things like row.creation_date
you could just use the builtin sqlite3.Row
row factory and access the equivalent with row['creation_date']
.
An improved row_factory
is actually this, which can be reused for all sorts of queries:
from collections import namedtuple
def namedtuple_factory(cursor, row):
"""Returns sqlite rows as named tuples."""
fields = [col[0] for col in cursor.description]
Row = namedtuple("Row", fields)
return Row(*row)
conn = sqlite3.connect(":memory:")
conn.row_factory = namedtuple_factory
cur = con.cursor()
There is another one row_factory
on the top of namedtuple
:
from collection import namedtuple
def namedtuple_factory(cursor, row, cls=[None]):
rf = cls[0]
if rf is None:
fields = [col[0] for col in cursor.description]
cls[0] = namedtuple("Row", fields)
return cls[0](*row)
return rf(*row)
One can generalize further in order to use other class factories:
def make_row_factory(cls_factory, **kw):
def row_factory(cursor, row, cls=[None]):
rf = cls[0]
if rf is None:
fields = [col[0] for col in cursor.description]
cls[0] = cls_factory("Row", fields, **kw)
return cls[0](*row)
return rf(*row)
return row_factory
These factory functions are useful for cases when all query results have same fields.
Examples:
namedtuple_factory = make_row_factory(namedtuple)
import dataclass
row_factory = make_row_factory(dataclass.make_dataclass)
pip3 install recordclass
import recordclass
row_factory = make_row_factory(recordclass.make_dataclass, fast_new=True)
Here are some performance counters to compare different ways (debian linux, 64 bit, python 3.9).
Script for creation test database:
N = 1000000
conn = sqlite3.connect('example.db')
c = conn.cursor()
c.execute('''CREATE TABLE test
(id int, x real, y real, p int, q int)''')
gen = ((i, random(), random(), randint(0,N), randint(0,N)) for i in range(N))
c.executemany("INSERT INTO test VALUES (?,?,?,?,?)", gen)
conn.commit()
conn.close()
Default:
conn = sqlite3.connect('example.db')
c = conn.cursor()
%time res = [row for row in c.execute("SELECT id,x,y,p,q FROM test")]
conn.close()
print(N * sys.getsizeof(res[0]) // 1000000, 'Mb')
CPU times: user 971 ms, sys: 92.1 ms, total: 1.06 s
Wall time: 1.06 s
80 Mb
sqlite3.Row:
conn = sqlite3.connect('example.db')
conn.row_factory = sqlite3.Row
c = conn.cursor()
%time res = [row for row in c.execute("SELECT id,x,y,p,q FROM test")]
conn.close()
# print(N * sys.getsizeof(res[0]) // 1000000, 'Mb')
CPU times: user 1.11 s, sys: 80.1 ms, total: 1.19 s
Wall time: 1.19 s
namedtuple:
from collections import namedtuple Row = namedtuple("Row", "id x y p q") conn = sqlite3.connect('example.db') c = conn.cursor() %time res = [Row(*row) for row in c.execute("SELECT id,x,y,p,q FROM test")] conn.close() print(N * sys.getsizeof(res[0]) // 1000000, 'Mb')
CPU times: user 1.89 s, sys: 71.8 ms, total: 1.96 s
Wall time: 1.96 s
80 Mb
namedtuple-based row factory:
conn = sqlite3.connect('example.db')
conn.row_factory = make_row_factory(namedtuple)
c = conn.cursor()
%time res = [row for row in c.execute("SELECT id,x,y,p,q FROM test")]
conn.close()
print(N * sys.getsizeof(res[0]) // 1000000, 'Mb')
CPU times: user 1.93 s, sys: 116 ms, total: 2.05 s
Wall time: 2.05 s
80 Mb
recordclass:
from recordclass import make_dataclass
Row = make_dataclass("Row", "id x y p q", fast_new=True)
conn = sqlite3.connect('example.db')
c = conn.cursor()
%time res = [Row(*row) for row in c.execute("SELECT id,x,y,p,q FROM test")]
conn.close()
print(N * sys.getsizeof(res[0]) // 1000000, 'Mb')
CPU times: user 1 s, sys: 72.2 ms, total: 1.08 s
Wall time: 1.07 s
56 Mb
recordclass-based row factory:
conn = sqlite3.connect('example.db')
conn.row_factory = make_row_factory(make_dataclass, fast_new=True)
c = conn.cursor()
%time res = [row for row in c.execute("SELECT id,x,y,p,q FROM test")]
conn.close()
print(N * sys.getsizeof(res[0]) // 1000000, 'Mb')
CPU times: user 1.11 s, sys: 76.2 ms, total: 1.19 s
Wall time: 1.19 s
56 Mb
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With