I have about 200 copies of a SQLite database. All taken at different times with different data in them. Some rows are deleted and some are added. They are all in a single directory.
I want to merge all the rows in the table my_table
, using all the .db
files in the directory. I want duplicate rows to be deleted, showing all entires from all the databases, just once.
I'd like to do this in pure SQL, but I don't think it's possible, so we can use Python too.
Table definition:
CREATE TABLE my_table (
ROWID INTEGER PRIMARY KEY AUTOINCREMENT,
guid TEXT UNIQUE NOT NULL,
text TEXT,
replace INTEGER DEFAULT 0,
service_center TEXT,
handle_id INTEGER DEFAULT 0,
subject TEXT,
country TEXT,
attributedBody BLOB,
version INTEGER DEFAULT 0,
type INTEGER DEFAULT 0,
service TEXT,
account TEXT,
account_guid TEXT,
error INTEGER DEFAULT 0,
date INTEGER,
date_read INTEGER,
date_delivered INTEGER,
is_delivered INTEGER DEFAULT 0,
is_finished INTEGER DEFAULT 0,
is_emote INTEGER DEFAULT 0,
is_from_me INTEGER DEFAULT 0,
is_empty INTEGER DEFAULT 0,
is_delayed INTEGER DEFAULT 0,
is_auto_reply INTEGER DEFAULT 0,
is_prepared INTEGER DEFAULT 0,
is_read INTEGER DEFAULT 0,
is_system_message INTEGER DEFAULT 0,
is_sent INTEGER DEFAULT 0,
has_dd_results INTEGER DEFAULT 0,
is_service_message INTEGER DEFAULT 0,
is_forward INTEGER DEFAULT 0,
was_downgraded INTEGER DEFAULT 0,
is_archive INTEGER DEFAULT 0,
cache_has_attachments INTEGER DEFAULT 0,
cache_roomnames TEXT,
was_data_detected INTEGER DEFAULT 0,
was_deduplicated INTEGER DEFAULT 0,
is_audio_message INTEGER DEFAULT 0,
is_played INTEGER DEFAULT 0,
date_played INTEGER,
item_type INTEGER DEFAULT 0,
other_handle INTEGER DEFAULT -1,
group_title TEXT,
group_action_type INTEGER DEFAULT 0,
share_status INTEGER,
share_direction INTEGER,
is_expirable INTEGER DEFAULT 0,
expire_state INTEGER DEFAULT 0,
message_action_type INTEGER DEFAULT 0,
message_source INTEGER DEFAULT 0
)
To be able to access both the master database and a snapshot at the same time, use ATTACH. To delete and old version of a row, use INSERT OR REPLACE:
ATTACH 'snapshot123.db' AS snapshot;
INSERT OR REPLACE INTO main.my_table SELECT * FROM snapshot.my_table;
DETACH snapshot;
Do this with all databases, in order from oldest to newest. (SQLite has no loop control mechanism for this; do this loop in Python.)
Alternatively, you can go backwards, from newest and oldest, and insert only rows that do not yet exist:
ATTACH 'snapshot123.db' AS snapshot;
INSERT OR IGNORE INTO main.my_table SELECT * FROM snapshot.my_table;
DETACH snapshot;
import sqlite3
conn_1 = sqlite3.connect('master.db')
c = conn_1.cursor()
import glob, os
os.chdir("/Users/me/Downloads/Archives")
for file in glob.glob("*.db"):
if file != "master.db":
print file
conn_2 = sqlite3.connect(file)
c2 = conn_2.cursor()
query = "ATTACH '%s' AS snapshot;" % file
query += "INSERT OR IGNORE INTO master.my_table SELECT * FROM snapshot.my_table;"
query += "DETACH snapshot;"
c.executescript(query)
Error: sqlite3.OperationalError: no such table: master.my_table
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With