Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MySQL looking for a nice index

I have this table (simplified version)

create table completions (
  id int(11) not null auto_increment,
  completed_at datetime default null,
  is_mongo_synced tinyint(1) default '0',
  primary key (id),
  key index_completions_on_completed_at_and_is_mongo_synced_and_id (completed_at,is_mongo_synced,id),
) engine=innodb auto_increment=4785424 default charset=utf8 collate=utf8_unicode_ci;

Size:

select count(*) from completions; -- => 4817574

Now I try to execute this query:

select completions.* 
from completions  
where 
  (completed_at is not null) 
  and completions.is_mongo_synced = 0 
  order by completions.id asc limit 10;

And it takes 9mins.

I see there is not any index used, the explain extend returns this:

id: 1 
select_type: SIMPLE
table: completions 
type: index 
possible_keys: index_completions_on_completed_at_and_is_mongo_synced_and_id  
key: PRIMARY 
key_len: 4 
ref: NULL  
rows: 20  
filtered: 11616415.00 
Extra: Using where

If I force the index:

select completions.* 
from completions  
force index(index_completions_on_completed_at_and_is_mongo_synced_and_id)
where 
  (completed_at is not null) 
  and completions.is_mongo_synced = 0 
  order by completions.id asc limit 10;

It takes 1,22s, which is much better. The explain extend returns:

id: 1
select_type: SIMPLE
table: completions
type: range
possible_keys: index_completions_on_completed_at_and_is_mongo_synced_and_id
key: index_completions_on_completed_at_and_is_mongo_synced_and_id
key_len: 6
ref: null
rows: 2323334
filtered: 100
Extra: Using index condition; Using filesort

Now if I narrow the query by completions.id like:

select completions.* 
from completions  
force index(index_completions_on_completed_at_and_is_mongo_synced_and_id)
where 
  (completed_at is not null) 
  and completions.is_mongo_synced = 0 
  and completions.id > 2000000
  order by completions.id asc limit 10;

It takes 1,31s, still good. The explain extend returns:

id: 1
select_type: SIMPLE
table: completions
type: range
possible_keys: index_completions_on_completed_at_and_is_mongo_synced_and_id
key: index_completions_on_completed_at_and_is_mongo_synced_and_id
key_len: 6
ref: null
rows: 2323407
filtered: 100
Extra: Using index condition; Using filesort

The point is that if for the last query I don't force the index:

select completions.* 
from completions  
where 
  (completed_at is not null) 
  and completions.is_mongo_synced = 0 
  and completions.id > 2000000
  order by completions.id asc limit 10;

It takes 85ms, check that it is ms and not s. The explain extend returns:

id: 1
select_type: SIMPLE
table: completions
type: range
possible_keys: PRIMARYindex_completions_on_completed_at_and_is_mongo_synced_and_id
key: PRIMARY
key_len: 4
ref: null
rows: 2323451
filtered: 100
Extra: Using where

Not only this is making me nuts but also the fact that the performance of the last query is highly affected for small changes in the number of the filter:

select completions.* 
from completions  
where 
  (completed_at is not null) 
  and completions.is_mongo_synced = 0 
  and completions.id > 1600000
  order by completions.id asc limit 10;

It takes 13s

Things I don't understand:

  1. Why this the below query A is faster than query B when query B suppose to use a more precise index: c

Query A:

select completions.* 
from completions  
where 
  (completed_at is not null) 
  and completions.is_mongo_synced = 0 
  and completions.id > 2000000
  order by completions.id asc limit 10;

85ms

Query B:

select completions.* 
from completions  
force index(index_completions_on_completed_at_and_is_mongo_synced_and_id)
where 
  (completed_at is not null) 
  and completions.is_mongo_synced = 0 
  and completions.id > 2000000
  order by completions.id asc limit 10;

1,31s

2. Why such a difference in performan among the below queries:

Query A:

select completions.* 
from completions  
where 
  (completed_at is not null) 
  and completions.is_mongo_synced = 0 
  and completions.id > 2000000
  order by completions.id asc limit 10;

85ms

Query B:

select completions.* 
from completions  
where 
  (completed_at is not null) 
  and completions.is_mongo_synced = 0 
  and completions.id > 1600000
  order by completions.id asc limit 10;

13s

3. Why MySQL is not using automatically the index for the below query:

Index:

key index_completions_on_completed_at_and_is_mongo_synced_and_id (completed_at,is_mongo_synced,id),

Query:

select completions.* 
from completions  
force index(index_completions_on_completed_at_and_is_mongo_synced_and_id)
where 
  (completed_at is not null) 
  and completions.is_mongo_synced = 0 
  and completions.id > 2000000
  order by completions.id asc limit 10;

Update

Some more data requested in the comments

Num of rows based on is_mongo_synced values
 select
     completions.is_mongo_synced,
     count(*)
 from completions
 group by completions.is_mongo_synced;

Result:

[
  {
    "is_mongo_synced":0,
    "count(*)":2731921
  },
  {
    "is_mongo_synced":1,
    "count(*)":2087869
  }
]
Queries without order by
select completions.* 
from completions  
where 
  (completed_at is not null) 
  and completions.is_mongo_synced = 0 
  and completions.id > 2000000
  limit 10;

544ms

select completions.* 
from completions  
force index(index_completions_on_completed_at_and_is_mongo_synced_and_id)
where 
  (completed_at is not null) 
  and completions.is_mongo_synced = 0 
  and completions.id > 2000000
  limit 10;

314ms

But, anyhow, I need the order because I'm scanning the table batch by batch.

like image 964
fguillen Avatar asked Dec 10 '15 12:12

fguillen


People also ask

How does MySQL choose which index to use?

If there is a choice between multiple indexes, MySQL normally uses the index that finds the smallest number of rows (the most selective index). If the table has a multiple-column index, any leftmost prefix of the index can be used by the optimizer to look up rows.

How do I choose a good primary index?

Choose columns for the primary index that distribute table rows evenly across the AMPs. The more singular the values for a column, the more optimal their distribution. Choose as few columns as possible for the primary index to optimize its generality.

Which columns should be indexed in MySQL?

In some cases, it also makes sense to create a separate index that contains the ORDER BY clause's columns, as MySQL sometimes chooses to use it. Please note though that for this to happen, the index should contain all columns from the ORDER BY clause and they should all be specified with the same order (ASC / DESC).

Why MySQL does not pick correct index for few queries?

MySQL can only use an index for searches up to the first range. Removing the range requirement on log_type should improve performance, but you may get mixed results by adding it in later in an outer query. If you do this, you'll also have to remove log_type from the covering index.

What is an index in MySQL?

MySQL uses indexes to quickly find rows with specific column values. Without an index, MySQL must scan the whole table to locate the relevant rows. The larger table, the slower it searches. In this section, you will learn about MySQL index including creating indexes, removing indexes, listing all indexes of a table and other important features ...

Where can I find more information about MySQL's indexing algorithm?

More information and a pseudo-algorithm can be found on MySQL's documentation. If you're looking for a way to automate your index creation, while also adding the benefit of a proprietary indexing algorithm and query optimization recommendations, you can try out EverSQL Index Advisor which does all the heavy lifting for you.

How do I find the best index for my database?

So, we decided to write about it. The first option is to use EverSQL to automatically find indexes that are best for your database. The second option is to read our detailed tutorial below and learn more about indexing best practices.

Why can't I create indexes for views in MySQL?

Because you can't create index for views in mysql like in oracle. But in some test that I took, indexes can be used in view select statement. Maybe I've created those views in a wrong way. I'll use a example to describe my problem. We have a table that records data for high scores in NBA games, with index on column [happend_in]


Video Answer


1 Answers

Your questions are quite complicated. But, your for your first query:

select completions.* 
from completions  
where completed_at is not null and
      completions.is_mongo_synced = 0 
order by completions.id asc
limit 10;

The best index in on (is_mongo_synced, completed_at). There might be other ways to write the query, but in the index you are forcing, the columns are not in an optimal order.

The difference in performance in your second query is probably because the data is actually being sorted. A few extra hundreds of thousands of rows can affect the sort time. The dependence on the value of id is probably way the index is not used. If you changed the index to (is_mongo_synced, id, completed_at), then index usage would be more likely.

MySQL has good documentation on composite indexes. You might want to review it here.

After adding the suggested filter

After adding the index:

KEY `index_completions_on_is_mongo_synced_and_id_and_completed_at` (`is_mongo_synced`,`id`,`completed_at`) USING BTREE,

And executing the long query again

select completions.* 
from completions  
where 
  (completed_at is not null) 
  and completions.is_mongo_synced = 0 
  order by completions.id asc limit 10;

It takes 156ms, which is very good.

Checking the explain extended we see MySQL is using the correct index:

id: 1
select_type: SIMPLE
table: completions
type: ref
possible_keys: index_completions_on_completed_at_and_is_mongo_synced_and_id,index_completions_on_is_mongo_synced_and_id_and_completed_at
key: index_completions_on_is_mongo_synced_and_id_and_completed_at
key_len: 2
ref: const
rows: 1626322
filtered: 100
Extra: Using index condition; Using where
like image 193
Gordon Linoff Avatar answered Oct 06 '22 17:10

Gordon Linoff