Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why do inserts slow down so much as DB grows?

I'm doing a personal project that generates a lot of data, and I thought that storing it in a local DB made sense. However, I'm seeing insane slowdown as the DB grows, which makes it infeasible to run.

I made a simple test showing what I'm doing. I make a dictionary where I do a bunch of local processing (roughly 1 million entries), then batch insert that into the SQLite DB, then loop and do it all again. Here's the code:

from collections import defaultdict
import sqlite3
import datetime
import random

def log(s):
    now = datetime.datetime.now()
    print(str(now) + ": " + str(s))

def create_table():
    conn = create_connection()
    with conn:
        cursor = conn.cursor()

        sql = """
            CREATE TABLE IF NOT EXISTS testing (
                test text PRIMARY KEY,
                number integer
            );"""
        cursor.execute(sql)
    conn.close()

def insert_many(local_db):
    sql = """INSERT INTO testing(test, number) VALUES(?, ?) ON CONFLICT(test) DO UPDATE SET number=number+?;"""

    inserts = []
    for key, value in local_db.items():
        inserts.append((key, value, value))

    conn = create_connection()
    with conn:
        cursor = conn.cursor()
        cursor.executemany(sql, inserts)
    conn.close()

def main():
    i = 0
    log("Starting to process records")
    for i in range(1, 21):
        local_db = defaultdict(int)
        for j in range(0, 1000000):
            s = "Testing insertion " + str(random.randrange(100000000))
            local_db[s] += 1
        log("Created local DB for " + str(1000000 * i) + " records")
        insert_many(local_db)
        log("Finished inserting " + str(1000000 * i) + " records")

def create_connection():
    conn = None
    try:
        conn = sqlite3.connect('/home/testing.db')
    except Error as e:
        print(e)

    return conn

if __name__ == '__main__':
    create_table()
    main()

This runs great for a second, then slows down like crazy. Here's the output I just got:

2019-10-23 15:28:59.211036: Starting to process records
2019-10-23 15:29:01.308668: Created local DB for 1000000 records
2019-10-23 15:29:10.147762: Finished inserting 1000000 records
2019-10-23 15:29:12.258012: Created local DB for 2000000 records
2019-10-23 15:29:28.752352: Finished inserting 2000000 records
2019-10-23 15:29:30.853128: Created local DB for 3000000 records
2019-10-23 15:39:12.826357: Finished inserting 3000000 records
2019-10-23 15:39:14.932100: Created local DB for 4000000 records
2019-10-23 17:21:37.257651: Finished inserting 4000000 records
...

As you can see, the first million inserts take 9 seconds, then the next million take 16, then it balloons to 10 minutes, then an hour and 40 minutes (!). Is there something weird I'm doing that causes this crazy slowdown, or is this a limitation of sqlite?

like image 571
user1902853 Avatar asked Feb 27 '26 07:02

user1902853


1 Answers

(More of an extended comment than an answer)

SQLite only supports BTree indexes. For strings, which might have different length, the tree is storing row ids. Reading complexity of a tree is O(log(n)), where n is the length of the table, however, it will be multiplied by the complexity of reading and comparing a string value from the table. So, unless there is a really good reason to, it is better to have an integer field as a primary key.

What makes the situation worse in this case, the strings you're inserting have fairly long shared prefix ("Testing insertion "), so searching for the first mismatch takes longer.

Speedup suggestions, in order of expected effect size:

  • Real databases (MariaDB, Postgres) support hash indexes, which will solve this problem.
  • disable autocommit (skip unnecessary disk writes; very expensive)
  • reverse the text string (number before fixed text), or even just leave the number part only
  • use mass insertions (multiple records in one statement)

The @peak's answer avoids the whole problem by not using index. If index is not required at all, this is definitely a way to go.

like image 180
Marat Avatar answered Mar 01 '26 19:03

Marat



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!