Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Optimize SQL Query on SQLite3 by using indexes

I'm trying to optimize a SQL Query by creating indexes to have the best performances.

Table definition

CREATE TABLE Mots (
  numero            INTEGER NOT NULL, 
  fk_dictionnaires integer(5) NOT NULL, 
  mot              varchar(50) NOT NULL, 
  ponderation      integer(20) NOT NULL,
  drapeau varchar(1) NOT NULL,
  CONSTRAINT pk_mots PRIMARY KEY(numero),
  CONSTRAINT uk_dico_mot_mots UNIQUE(fk_dictionnaires, mot),
  CONSTRAINT fk_mots_dictionnaires FOREIGN KEY(fk_dictionnaires) REFERENCES Dictionnaires(numero)
  );

Indexes definition

CREATE INDEX idx_dictionnaires ON mots(fk_dictionnaires DESC);
CREATE INDEX idx_mots_ponderation ON mots(ponderation);
CREATE UNIQUE INDEX idx_mots_unique ON mots(fk_dictionnaires, mot);

SQL Query :

SELECT numero, mot, ponderation, drapeau 
FROM mots 
WHERE mot LIKE 'ar%' 
   AND fk_dictionnaires=1 
   AND LENGTH(mot)>=4 
   ORDER BY ponderation DESC 
LIMIT 5;

Query Plan

0|0|0|SEARCH TABLE mots USING INDEX idx_dictionnaires (fk_dictionnaires=?) (~2 rows)
0|0|0|USE TEMP B-TREE FOR ORDER BY

Defined indexes don't seem used and the query lasts (according to the .timer) :

CPU Time: user 0.078001 sys 0.015600

However, when I removed the fk_dictionnaires=1. My indexes are correctly used and the performances are around 0.000000-0.01XXXXXX sec

0|0|0|SCAN TABLE mots USING INDEX idx_mots_ponderation (~250000 rows)

I found out some similars questions on stackoverflow but no anwser help me.

  • Removing a Temporary B Tree Sort from a SQLite Query
  • Similar issue

How can I improve the performances by using indexes or/and by changing the SQL Query? Thanks in advance.

like image 507
A. Geiser Avatar asked Aug 16 '12 09:08

A. Geiser


People also ask

Does indexing improve query performance?

Indexing makes columns faster to query by creating pointers to where data is stored within a database. Imagine you want to find a piece of information that is within a large database. To get this information out of the database the computer will look through every row until it finds it.

Does SQLite support indexing?

A table may have multiple indexes. Whenever you create an index, SQLite creates a B-tree structure to hold the index data. The index contains data from the columns that you specify in the index and the corresponding rowid value. This helps SQLite quickly locate the row based on the values of the indexed columns.

Why is sqlite3 so slow?

The SQLite docs explains why this is so slow: Transaction speed is limited by disk drive speed because (by default) SQLite actually waits until the data really is safely stored on the disk surface before the transaction is complete. That way, if you suddenly lose power or if your OS crashes, your data is still safe.


1 Answers

SQLite seems to think that the idx_dictionnaires index is very sparse and concludes that if it scans using idx_dictionnaires, it will only have to examine a couple of rows. However, the performance results you quote suggest that it must be examining more than just a couple rows. First, why don't you try ANALYZE mots, so SQLite will have up-to-date information on the cardinality of each index available?

Here is something else which might help, from the SQLite documentation:


Terms of the WHERE clause can be manually disqualified for use with indices by prepending a unary + operator to the column name. The unary + is a no-op and will not slow down the evaluation of the test specified by the term. But it will prevent the term from constraining an index. So, in the example above, if the query were rewritten as:

SELECT z FROM ex2 WHERE +x=5 AND y=6;

The + operator on the x column will prevent that term from constraining an index. This would force the use of the ex2i2 index.

Note that the unary + operator also removes type affinity from an expression, and in some cases this can cause subtle changes in the meaning of an expression. In the example above, if column x has TEXT affinity then the comparison "x=5" will be done as text. But the + operator removes the affinity. So the comparison "+x=5" will compare the text in column x with the numeric value 5 and will always be false.


If ANALYZE mots isn't enough to help SQLite choose the best index to use, you can use this feature to force it to use the index you want.

You could also try compound indexes -- it looks like you already defined one on fk_dictionnaires,mot, but SQLite isn't using it. For the "fast" query, SQLite seemed to prefer using the index on ponderation, to avoid sorting the rows at the end of the query. If you add an index on fk_dictionnaires,ponderation DESC, and SQLite actually uses it, it could pick out the rows which match fk_dictionnaires=1 without a table scan and avoid sorting at the end.


POSTSCRIPT: The compound index I suggested above "fixed" the OP's performance problem, but he also asked how and why it works. @AGeiser, I'll use a brief illustration to try to help you understand DB indexes intuitively:

Imagine you need to find all the people in your town whose surnames start with "A". You have a directory of all the names, but they are in random order. What do you do? You have no choice but to read through the whole directory, and pick out the ones which start with "A". Sounds like a lot of work, right? (This is like a DB table with no indexes.)

But what if somebody gives you a phone book, with all the names in alphabetical order? Now you can just find the first and last entries which start with "A" (using something like a binary search), and take all the entries in that range. You don't have to even look at all the other names in the book. This will be way faster. (This is like a DB table with an index; in this case, call it an index on last_name,first_name.)

Now what if you want all the people whose names start with "A", but in the case that 2 people have the same name, you want them to be ordered by postal code? Even if you get the needed names quickly using the "phone book" (ie. index on last_name,first_name), you will still have to sort them all manually... so it starts sounding like a lot of work again. What could make this job really easy?

It would take another "phone book" -- but one in which the entries are ordered first by name, and then by postal code. With a "phone book" like that, you could quickly select the range of entries which you need, and you wouldn't even need to sort them -- they would already be in the desired order. (This is an index on last_name,first_name,postal_code.)

I think this illustration should make it clear how indexes can help SELECT queries, not just by reducing the number of rows which must be examined, but also by (potentially) eliminating the need for a separate "sort" phase after the needed rows are found. Hopefully it also makes it clear that a compound index on a,b is completely different from one on b,a. I could go on giving more "phone book" examples, but this answer would become so long that it would be more like a blog post. To build your intuition on which indexes are likely to benefit a query, I recommend the book from O'Reilly on "SQL Antipatterns" (especially chapter 13, "Index Shotgun").

like image 87
Alex D Avatar answered Oct 09 '22 19:10

Alex D