Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Merge multiple backups of the same table schema into 1 master table

Tags:

python

sql

sqlite

I have about 200 copies of a SQLite database. All taken at different times with different data in them. Some rows are deleted and some are added. They are all in a single directory.

I want to merge all the rows in the table my_table, using all the .db files in the directory. I want duplicate rows to be deleted, showing all entires from all the databases, just once.

I'd like to do this in pure SQL, but I don't think it's possible, so we can use Python too.

Table definition:

CREATE TABLE my_table (
    ROWID INTEGER PRIMARY KEY AUTOINCREMENT,
    guid TEXT UNIQUE NOT NULL,
    text TEXT,
    replace INTEGER DEFAULT 0,
    service_center TEXT,
    handle_id INTEGER DEFAULT 0,
    subject TEXT,
    country TEXT,
    attributedBody BLOB,
    version INTEGER DEFAULT 0,
    type INTEGER DEFAULT 0,
    service TEXT,
    account TEXT,
    account_guid TEXT,
    error INTEGER DEFAULT 0,
    date INTEGER,
    date_read INTEGER,
    date_delivered INTEGER,
    is_delivered INTEGER DEFAULT 0,
    is_finished INTEGER DEFAULT 0,
    is_emote INTEGER DEFAULT 0,
    is_from_me INTEGER DEFAULT 0,
    is_empty INTEGER DEFAULT 0,
    is_delayed INTEGER DEFAULT 0,
    is_auto_reply INTEGER DEFAULT 0,
    is_prepared INTEGER DEFAULT 0,
    is_read INTEGER DEFAULT 0,
    is_system_message INTEGER DEFAULT 0,
    is_sent INTEGER DEFAULT 0,
    has_dd_results INTEGER DEFAULT 0,
    is_service_message INTEGER DEFAULT 0,
    is_forward INTEGER DEFAULT 0,
    was_downgraded INTEGER DEFAULT 0,
    is_archive INTEGER DEFAULT 0,
    cache_has_attachments INTEGER DEFAULT 0,
    cache_roomnames TEXT,
    was_data_detected INTEGER DEFAULT 0,
    was_deduplicated INTEGER DEFAULT 0,
    is_audio_message INTEGER DEFAULT 0,
    is_played INTEGER DEFAULT 0,
    date_played INTEGER,
    item_type INTEGER DEFAULT 0,
    other_handle INTEGER DEFAULT -1,
    group_title TEXT,
    group_action_type INTEGER DEFAULT 0,
    share_status INTEGER,
    share_direction INTEGER,
    is_expirable INTEGER DEFAULT 0,
    expire_state INTEGER DEFAULT 0,
    message_action_type INTEGER DEFAULT 0,
    message_source INTEGER DEFAULT 0
)
like image 929
User Avatar asked Jan 17 '16 22:01

User


2 Answers

To be able to access both the master database and a snapshot at the same time, use ATTACH. To delete and old version of a row, use INSERT OR REPLACE:

ATTACH 'snapshot123.db' AS snapshot;
INSERT OR REPLACE INTO main.my_table SELECT * FROM snapshot.my_table;
DETACH snapshot;

Do this with all databases, in order from oldest to newest. (SQLite has no loop control mechanism for this; do this loop in Python.)

Alternatively, you can go backwards, from newest and oldest, and insert only rows that do not yet exist:

ATTACH 'snapshot123.db' AS snapshot;
INSERT OR IGNORE INTO main.my_table SELECT * FROM snapshot.my_table;
DETACH snapshot;
like image 131
CL. Avatar answered Nov 15 '22 19:11

CL.


import sqlite3
conn_1 = sqlite3.connect('master.db')
c = conn_1.cursor()
import glob, os
os.chdir("/Users/me/Downloads/Archives")
for file in glob.glob("*.db"):
    if file != "master.db":
        print file
        conn_2 = sqlite3.connect(file)
            c2 = conn_2.cursor()
            query = "ATTACH '%s' AS snapshot;" % file
            query += "INSERT OR IGNORE INTO master.my_table SELECT * FROM snapshot.my_table;"
            query += "DETACH snapshot;"
            c.executescript(query)

Error: sqlite3.OperationalError: no such table: master.my_table

like image 1
User Avatar answered Nov 15 '22 19:11

User