Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQLite join optimisation

If you have a query such as:

select a.Name, a.Description from a
inner join b on a.id1 = b.id1
inner join c on b.id2 = c.id2
group by a.Name, a.Description

What would be the most optimal columns to index for this query in SQLite if you consider that there are over 100,000 rows in each of the tables?

The reason that I ask is that I do not get the performance with the query with the group by that I would expect from another RDBMS (SQL Server) when I apply the same optimisation.

Would I be right in thinking that all columns referenced on a single table in a query in SQLite need to be included in a single composite index for best performance?

like image 480
gmn Avatar asked Nov 15 '10 10:11

gmn


2 Answers

The problem is that you're expecting SQLite to have the same performance characteristics as a full RDBMS. It won't. SQLLite doesn't have the luxury of getting to cache quite as much in memory, has to rebuild the cache every time you run the application, is probably limited to set number of cores, etc, etc, etc. Tradeoffs for using an embedded RDBMS over a full one.

As far as optimizations go, try indexing the lookup columns and test. Then try creating a covering index. Be sure to test both selects and code paths that update the database, you're speeding up one at the expense of the other. Find the indexing that gives the best balance between the two for your needs and go with it.

like image 156
Donnie Avatar answered Sep 21 '22 20:09

Donnie


From the SQLite query optimization overview:

When doing an indexed lookup of a row, the usual procedure is to do a binary search on the index to find the index entry, then extract the rowid from the index and use that rowid to do a binary search on the original table. Thus a typical indexed lookup involves two binary searches. If, however, all columns that were to be fetched from the table are already available in the index itself, SQLite will use the values contained in the index and will never look up the original table row. This saves one binary search for each row and can make many queries run twice as fast.

For any other RDBMS, I'd say to put a clustered index on b.id1 and c.id2. For SQLite, you might be better off including any columns from b and c that you want to lookup in those indexes too.

like image 30
littlegreen Avatar answered Sep 19 '22 20:09

littlegreen