Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Moving back and forth between an on-disk database and a fast in-memory database?

Tags:

python

sqlite

Python's sqlite3 :memory: option provides speedier queries and updates than the equivalent on-disk database. How can I load a disk-based database into memory, do fast operations on it, and then write the updated version back to disk?

The question How to browse an in memory sqlite database in python seems related but it focuses on how to use a disk-based browsing tool on an in-memory db. The question How can I copy an in-memory SQLite database to another in-memory SQLite database in Python? is also related but it is specific to Django.

My current solution is to read all of the tables, one-at-a-time, from the disk-based database into lists of tuples, then manually recreate the entire database schema for the in-memory db, and then load the data from the lists of tuples into the in-memory db. After operating on the data, the process is reversed.

There must be a better way!

like image 699
Raymond Hettinger Avatar asked Jul 02 '15 18:07

Raymond Hettinger


People also ask

How does an in-memory database provide fast access to data?

An in-memory database keeps all its data in the random access memory (RAM) of a computer. Only the main memory is accessed when querying data. This allows for faster access of that data than a disk-based system.

How much faster is in-memory database?

RAM is 100 Thousand Times Faster than Disk for Database Access.

What is on disk and in-memory?

an on-disk database stores the data on disk and uses memory for caching. Of course, we all know memory (RAM) is much (multiple order of magnitude) faster than disk, so the advantage of an in-memory database is clear.


2 Answers

The answer at How to load existing db file to memory in Python sqlite3? provided the important clues. Building on that answer, here is a simplification and generalization of that code.

It eliminates eliminate the unnecessary use of StringIO and is packaged into reusable form for both reading into and writing from an in-memory database.

import sqlite3

def copy_database(source_connection, dest_dbname=':memory:'):
    '''Return a connection to a new copy of an existing database.                        
       Raises an sqlite3.OperationalError if the destination already exists.             
    '''
    script = ''.join(source_connection.iterdump())
    dest_conn = sqlite3.connect(dest_dbname)
    dest_conn.executescript(script)
    return dest_conn

if __name__ == '__main__':
    from contextlib import closing

    with closing(sqlite3.connect('pepsearch.db')) as disk_db:
        mem_db = copy_database(disk_db)

    mem_db.execute('DELETE FROM documents WHERE uri="pep-3154"')
    mem_db.commit()

    copy_database(mem_db, 'changed.db').close()
like image 184
Raymond Hettinger Avatar answered Oct 30 '22 01:10

Raymond Hettinger


Frankly, I wouldn't fool around too much with in-memory databases, unless you really do need an indexed structure that you know will always fit entirely within available memory. SQLite is extremely smart about its I/O, especially when you wrap everything (including reads ...) into transactions, as you should. It will very efficiently keep things in memory as it is manipulating data structures that fundamentally live on external storage, and yet it will never exhaust memory (nor, take too much of it). I think that RAM really does work better as "a buffer" instead of being the primary place where data is stored ... especially in a virtual storage environment, where everything must be considered as "backed by external storage anyway."

like image 32
Mike Robinson Avatar answered Oct 30 '22 02:10

Mike Robinson