Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MySQL uses wrong index

Tags:

indexing

mysql

There is a table with these columns:

Name Type
Id int
DateTime datetime(6)
CompanyId int (FK)
IsExcluded tinyint(1)

There are 2 BTREE indexes:

  1. CompanyId
  2. ComapnyId,DateTime,IsExcluded

Following select is using the first index with only one column and this select takes 2.3sec. When I force the second index select takes 0.015sec. Also when I decrease DateTime range by one day MySQL is using the second index without forcing.

select IsExcluded,DateTime,CompanyId FROM table where 
IsExcluded = 0 and 
DateTime >= '2022-06-02' and 
DateTime < '2022-09-22' and 
CompanyId = 1;

I understand that if more than ~20%-30% of rows are selected, MySQL could decide to ignore index, but I don't understand why MySQL is selecting different obviously not the most suitable index.

Is there any way how to setup or "learn" MySql that second index for this query is the most suitable (without inserting anything else to query) ?

like image 453
Davidm176 Avatar asked Sep 17 '25 11:09

Davidm176


1 Answers

Is there any way how to setup or "learn" MySql that second index for this query is the most suitable (without inserting anything else to query) ?

You can in theory tweak the index statistics manually to influence the optimizer's choice. See https://dev.mysql.com/blog-archive/histogram-statistics-in-mysql/

But frankly, I know of no developers who use this feature. It's just too difficult to figure out how to use it, and any custom index statistics you choose would be likely to get out of date very quickly.

What is more common is to use index hints to tell the optimizer to consider only a specific index, or to ignore other indexes. See https://dev.mysql.com/doc/refman/8.0/en/index-hints.html

Example:

SELECT IsExcluded, DateTime, CompanyId 
FROM table USE INDEX (myindex_with_three_columns)
WHERE IsExcluded = 0 
 AND DateTime >= '2022-06-02' 
 AND DateTime < '2022-09-22' 
 AND CompanyId = 1;

(I'm guessing at the name of your 3-column index.)

I would guess the reason your index is not used is that the DateTime portion is not helping enough to justify the wider index (i.e. your date range covers too many rows), and the IsExcluded portion is not used at all, because it follows a column that is used in a range condition. So the optimizer chooses an index that's more compact, so it can load the index into RAM in fewer I/O reads.

What's important to understand about column order in the index is that the columns involved in equality conditions should be to the left. Then you can have one column that is used in an inequality or range condition, and any subsequent columns in the index are not used for searching or sorting.

In your case you have an index on (CompanyId,DateTime,IsExcluded), but the second column is used in a range condition, so the third column isn't used for the search. At best it could try to use index condition pushdown, but that's not as good as really narrowing down the search.

The better index would be to order the columns so the DateTime column is last. Either of the columns used for equality may be first, but they both have to be before the DateTime column. Either of these orders would be better in this case:

(CompanyId,IsExcluded,DateTime)
(IsExcluded,CompanyId,DateTime)
like image 192
Bill Karwin Avatar answered Sep 19 '25 01:09

Bill Karwin