I'm having a performance problem in SQLite with a SELECT COUNT(*) on a large tables.
As I didn't yet receive a usable answer and I did some further testing, I edited my question to incorporate my new findings.
I have 2 tables:
CREATE TABLE Table1 ( Key INTEGER NOT NULL, ... several other fields ..., Status CHAR(1) NOT NULL, Selection VARCHAR NULL, CONSTRAINT PK_Table1 PRIMARY KEY (Key ASC)) CREATE Table2 ( Key INTEGER NOT NULL, Key2 INTEGER NOT NULL, ... a few other fields ..., CONSTRAINT PK_Table2 PRIMARY KEY (Key ASC, Key2 ASC))
Table1 has around 8 million records and Table2 has around 51 million records, and the databasefile is over 5GB.
Table1 has 2 more indexes:
CREATE INDEX IDX_Table1_Status ON Table1 (Status ASC, Key ASC) CREATE INDEX IDX_Table1_Selection ON Table1 (Selection ASC, Key ASC)
"Status" is required field, but has only 6 distinct values, "Selection" is not required and has only around 1.5 million values different from null and only around 600k distinct values.
I did some tests on both tables, you can see the timings below, and I added the "explain query plan" for each request (QP). I placed the database file on an USB-memorystick so i could remove it after each test and get reliable results without interference of the disk cache. Some requests are faster on USB (I suppose due to lack of seektime), but some are slower (table scans).
SELECT COUNT(*) FROM Table1 Time: 105 sec QP: SCAN TABLE Table1 USING COVERING INDEX IDX_Table1_Selection(~1000000 rows) SELECT COUNT(Key) FROM Table1 Time: 153 sec QP: SCAN TABLE Table1 (~1000000 rows) SELECT * FROM Table1 WHERE Key = 5123456 Time: 5 ms QP: SEARCH TABLE Table1 USING INTEGER PRIMARY KEY (rowid=?) (~1 rows) SELECT * FROM Table1 WHERE Status = 73 AND Key > 5123456 LIMIT 1 Time: 16 sec QP: SEARCH TABLE Table1 USING INDEX IDX_Table1_Status (Status=?) (~3 rows) SELECT * FROM Table1 WHERE Selection = 'SomeValue' AND Key > 5123456 LIMIT 1 Time: 9 ms QP: SEARCH TABLE Table1 USING INDEX IDX_Table1_Selection (Selection=?) (~3 rows)
As you can see the counts are very slow, but normal selects are fast (except for the 2nd one, which took 16 seconds).
The same goes for Table2:
SELECT COUNT(*) FROM Table2 Time: 528 sec QP: SCAN TABLE Table2 USING COVERING INDEX sqlite_autoindex_Table2_1(~1000000 rows) SELECT COUNT(Key) FROM Table2 Time: 249 sec QP: SCAN TABLE Table2 (~1000000 rows) SELECT * FROM Table2 WHERE Key = 5123456 AND Key2 = 0 Time: 7 ms QP: SEARCH TABLE Table2 USING INDEX sqlite_autoindex_Table2_1 (Key=? AND Key2=?) (~1 rows)
Why is SQLite not using the automatically created index on the primary key on Table1 ? And why, when he uses the auto-index on Table2, it still takes a lot of time ?
I created the same tables with the same content and indexes on SQL Server 2008 R2 and there the counts are nearly instantaneous.
One of the comments below suggested executing ANALYZE on the database. I did and it took 11 minutes to complete. After that, I ran some of the tests again:
SELECT COUNT(*) FROM Table1 Time: 104 sec QP: SCAN TABLE Table1 USING COVERING INDEX IDX_Table1_Selection(~7848023 rows) SELECT COUNT(Key) FROM Table1 Time: 151 sec QP: SCAN TABLE Table1 (~7848023 rows) SELECT * FROM Table1 WHERE Status = 73 AND Key > 5123456 LIMIT 1 Time: 5 ms QP: SEARCH TABLE Table1 USING INTEGER PRIMARY KEY (rowid>?) (~196200 rows) SELECT COUNT(*) FROM Table2 Time: 529 sec QP: SCAN TABLE Table2 USING COVERING INDEX sqlite_autoindex_Table2_1(~51152542 rows) SELECT COUNT(Key) FROM Table2 Time: 249 sec QP: SCAN TABLE Table2 (~51152542 rows)
As you can see, the queries took the same time (except the query plan is now showing the real number of rows), only the slower select is now also fast.
Next, I create dan extra index on the Key field of Table1, which should correspond to the auto-index. I did this on the original database, without the ANALYZE data. It took over 23 minutes to create this index (remember, this is on an USB-stick).
CREATE INDEX IDX_Table1_Key ON Table1 (Key ASC)
Then I ran the tests again:
SELECT COUNT(*) FROM Table1 Time: 4 sec QP: SCAN TABLE Table1 USING COVERING INDEX IDX_Table1_Key(~1000000 rows) SELECT COUNT(Key) FROM Table1 Time: 167 sec QP: SCAN TABLE Table2 (~1000000 rows) SELECT * FROM Table1 WHERE Status = 73 AND Key > 5123456 LIMIT 1 Time: 17 sec QP: SEARCH TABLE Table1 USING INDEX IDX_Table1_Status (Status=?) (~3 rows)
As you can see, the index helped with the count(*), but not with the count(Key).
Finaly, I created the table using a column constraint instead of a table constraint:
CREATE TABLE Table1 ( Key INTEGER PRIMARY KEY ASC NOT NULL, ... several other fields ..., Status CHAR(1) NOT NULL, Selection VARCHAR NULL)
Then I ran the tests again:
SELECT COUNT(*) FROM Table1 Time: 6 sec QP: SCAN TABLE Table1 USING COVERING INDEX IDX_Table1_Selection(~1000000 rows) SELECT COUNT(Key) FROM Table1 Time: 28 sec QP: SCAN TABLE Table1 (~1000000 rows) SELECT * FROM Table1 WHERE Status = 73 AND Key > 5123456 LIMIT 1 Time: 10 sec QP: SEARCH TABLE Table1 USING INDEX IDX_Table1_Status (Status=?) (~3 rows)
Although the query plans are the same, the times are a lot better. Why is this ?
The problem is that ALTER TABLE does not permit to convert an existing table and I have a lot of existing databases which i can not convert to this form. Besides, using a column contraint instead of table constraint won't work for Table2.
Has anyone any idea what I am doing wrong and how to solve this problem ?
I used System.Data.SQLite version 1.0.74.0 to create the tables and to run the tests I used SQLiteSpy 1.9.1.
Thanks,
Marc
So to make SELECT COUNT(*) queries fast, here's what to do:Get on any version that supports batch mode on columnstore indexes, and put a columnstore index on the table – although your experiences are going to vary dramatically depending on the kind of query you have.
7.2.SQLite provides the ability for advanced programmers to exercise control over the query plan chosen by the optimizer.
Actually, SQLite will easily do 50,000 or more INSERT statements per second on an average desktop computer. But it will only do a few dozen transactions per second. Transaction speed is limited by the rotational speed of your disk drive.
If you haven't DELETE
d any records, doing:
SELECT MAX(_ROWID_) FROM "table" LIMIT 1;
will avoid the full-table scan.
Note that _ROWID_
is a SQLite identifier.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With