Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQLite: efficient way to drop lots of rows

SQlite, Android, true story. I have a table, which I use as a cache:

CREATE TABLE cache(key TEXT, ts TIMESTAMP, size INTEGER, data BLOB);
CREATE UNIQUE INDEX by_key ON cache(key);
CREATE INDEX by_ts ON cache(ts);

During app lifetime I fill the cache and at some point I want to clear it out and drop N records. Typically this table will contain ~25000 blobs ~100-500Kb each, total blobs size in the DB is 600-800Mb, but now I test for ~2000 which are about 60Mb (following numbers are for this case). Clear removes 90% of cache entries.

I tried different ways to do it, here brief description:

[1] Worst and simplest. First select, than remove one by one, walking cursor. Terribly slow.

[2] Make SQLite to do it with query (delete blobs with totally N bytes in them):

DELETE FROM blobs WHERE
  ROWID IN (SELECT ROWID FROM blobs WHERE 
             (SELECT SUM(size) FROM blobs AS _ WHERE ts <= blobs.ts) <= N);

This is faster, but still terribly slow: ~15 sec. Seems also it like it has quadratic complexity.

[3] Select row around where to remove (using average blob size for computations) and delete with simple WHERE clause:

-- Find row after which to delete, let it's time stamp is T0:
SELECT ts FROM cache ORDER BY ts LIMIT 1 OFFSET count;
-- Delete
DELETE FROM cache WHERE ts < T0;

This is much better, but takes ~7 sec.

[4] Create new table, copy that I need to save and drop old one. Note, that I create index in the new table AFTER I copied all this stuff:

  -- Insert only rows I want leave
  INSERT INTO temp(key, ts, size, data) SELECT key, ts, size, data 
    FROM cache ORDER BY ts LIMIT count;
  -- Drop table and indices.
  DROP INDEX by_key;
  DROP INDEX by_ts;
  DROP TABLE cache;
  -- Rename temp table and create indices...

Copying takes ~300ms for 6Mb for blobs. But DROP TABLE is about ~8 sec.

Note in all cases I do VACUUM which takes another ~1 sec. How can I make it fast? Why DROP TABLE and deletion are so slow? I think it might be because of indices: when I dropped key index before DELETE it worked faster. How to make SQLite delete fast?

like image 549
demi Avatar asked Oct 23 '13 00:10

demi


People also ask

How do I delete multiple records in SQLite?

If you wanted to delete a number of rows within a range, you can use the AND operator with the BETWEEN operator. DELETE FROM table_name WHERE column_name BETWEEN value 1 AND value 2; Another way to delete multiple rows is to use the IN operator.

What is the difference between SQL and SQLite?

SQL is Structured Query Language which is used with databases like MySQL, Oracle, Microsoft SQL Server, IBM DB2, etc. SQLite is portable database resource. It could get an extension in whatever programming language used to access that database.

How do you drop a table in Python?

To drop a table from a SQLite3 database using python invoke the execute() method on the cursor object and pass the drop statement as a parameter to it.


2 Answers

You are working on a database with "big" data - ie, each blob using multiple pages.

At some point near optimal performance you will reach a limit you can't improve.

Checking all your choices, I see different behaviors, not just different algorithms.

[1] This one shouldn't be terrible slow as long as you use a transaction. You need two operations at once, query (to get blob size) and delete.

[2] This is a good approach. As two queries and a delete, all in a single command, so SQLite engine will optimize.

[3] This is a different behaviour from all before. Same as DELETE FROM cache WHERE ts < (SELECT ts FROM cache ORDER BY ts LIMIT 1 OFFSET count). Query is less expensive then previous, but I bet number of rows deleted are far less then previous one! Expensive part of query/delete will be delete! Query optimization is important, but things will always get slower in delete.

[4] This is a very bad approach!!! Copying all your data to a new table - maybe another database - will be VERY expensive. I only get one advantage from this: you may copy data to a new database and avoid VACUUM, as new database was build from base and it's clean.

About VACUUM... Worst then DELETE is VACUUM. Vacuum is not supposed to be used often in a database. I understand this algorithm is supposed to "clean" your database, but cleaning shouldn't be a frequent operation - databases are optimized for select/insert/delete/update - not to keep all data at a minimal size.

My choice would be using a DELETE ... IN (SELECT ...) single operation, according to predefined criteria. VACUUM wouldn't be used, at least not so often. One good choice would be monitor db size - when this size run over a limit, run a assumed expensive cleaning to trim database.

At last, when using multiple commands, never forget to use transactions!

like image 143
LS_ᴅᴇᴠ Avatar answered Oct 01 '22 17:10

LS_ᴅᴇᴠ


Apparently, what is slow is not finding the records to be deleted, but the actual deletion itself.

Check if PRAGMA secure_delete is set by default in your Android's SQLite. You should just disable it, just to be sure.

You do not need to run VACUUM; SQLite automatically resuses freed pages. You need VACUUM only if you actually know that the database will not grow again in size in the future.

like image 30
CL. Avatar answered Oct 01 '22 17:10

CL.