I'm doing big batch inserts into an SQLite3 database and I'm trying to get a sense for what sort of performance I should be expecting versus what I'm actually seeing.
My table looks like this:
cursor.execute(
"CREATE TABLE tweets(
tweet_hash TEXT PRIMARY KEY ON CONFLICT REPLACE,
tweet_id INTEGER,
tweet_text TEXT)"
)
and my inserts look like this:
cursor.executemany("INSERT INTO tweets VALUES (?, ?, ?)", to_write)
where to_write
is a list of tuples.
Currently, with about 12 million rows in the database, inserting 50 000
rows is taking me around 16 minutes, running on a 2008 macbook.
Does this sound reasonable, or is there something gross happening?
executemany() skyrockets. For example, on my machine inserting 1,000 rows into the same table in a database on the local network using cursor. execute() takes 410 ms, whereas using cursor. executemany() requires only 20 ms.
Python SQLite3 module is used to integrate the SQLite database with Python. It is a standardized Python DBI API 2.0 and provides a straightforward and simple-to-use interface for interacting with SQLite databases.
The SQLite docs explains why this is so slow: Transaction speed is limited by disk drive speed because (by default) SQLite actually waits until the data really is safely stored on the disk surface before the transaction is complete. That way, if you suddenly lose power or if your OS crashes, your data is still safe.
SQLite is fast and reliable and it requires no configuration or maintenance. It keeps things simple. SQLite "just works".
As I understand the main reason of bad performance is time you waste to commit many SQLite transactions. What to do?
Drop the indexes, then
PRAGMA synchronous = OFF (or NORMAL)
Insert blocks of N rows (define N, try N=5000 to start). Before inserting block do
BEGIN TRANSACTION
after inserting do
COMMIT
See also http://www.sqlite.org/faq.html#q19
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With