I have this table (simplified version)
create table completions (
id int(11) not null auto_increment,
completed_at datetime default null,
is_mongo_synced tinyint(1) default '0',
primary key (id),
key index_completions_on_completed_at_and_is_mongo_synced_and_id (completed_at,is_mongo_synced,id),
) engine=innodb auto_increment=4785424 default charset=utf8 collate=utf8_unicode_ci;
Size:
select count(*) from completions; -- => 4817574
Now I try to execute this query:
select completions.*
from completions
where
(completed_at is not null)
and completions.is_mongo_synced = 0
order by completions.id asc limit 10;
And it takes 9mins.
I see there is not any index used, the explain extend
returns this:
id: 1
select_type: SIMPLE
table: completions
type: index
possible_keys: index_completions_on_completed_at_and_is_mongo_synced_and_id
key: PRIMARY
key_len: 4
ref: NULL
rows: 20
filtered: 11616415.00
Extra: Using where
If I force the index:
select completions.*
from completions
force index(index_completions_on_completed_at_and_is_mongo_synced_and_id)
where
(completed_at is not null)
and completions.is_mongo_synced = 0
order by completions.id asc limit 10;
It takes 1,22s, which is much better. The explain extend
returns:
id: 1
select_type: SIMPLE
table: completions
type: range
possible_keys: index_completions_on_completed_at_and_is_mongo_synced_and_id
key: index_completions_on_completed_at_and_is_mongo_synced_and_id
key_len: 6
ref: null
rows: 2323334
filtered: 100
Extra: Using index condition; Using filesort
Now if I narrow the query by completions.id
like:
select completions.*
from completions
force index(index_completions_on_completed_at_and_is_mongo_synced_and_id)
where
(completed_at is not null)
and completions.is_mongo_synced = 0
and completions.id > 2000000
order by completions.id asc limit 10;
It takes 1,31s, still good. The explain extend
returns:
id: 1
select_type: SIMPLE
table: completions
type: range
possible_keys: index_completions_on_completed_at_and_is_mongo_synced_and_id
key: index_completions_on_completed_at_and_is_mongo_synced_and_id
key_len: 6
ref: null
rows: 2323407
filtered: 100
Extra: Using index condition; Using filesort
The point is that if for the last query I don't force the index:
select completions.*
from completions
where
(completed_at is not null)
and completions.is_mongo_synced = 0
and completions.id > 2000000
order by completions.id asc limit 10;
It takes 85ms, check that it is ms and not s. The explain extend
returns:
id: 1
select_type: SIMPLE
table: completions
type: range
possible_keys: PRIMARYindex_completions_on_completed_at_and_is_mongo_synced_and_id
key: PRIMARY
key_len: 4
ref: null
rows: 2323451
filtered: 100
Extra: Using where
Not only this is making me nuts but also the fact that the performance of the last query is highly affected for small changes in the number of the filter:
select completions.*
from completions
where
(completed_at is not null)
and completions.is_mongo_synced = 0
and completions.id > 1600000
order by completions.id asc limit 10;
It takes 13s
Things I don't understand:
Query A:
select completions.*
from completions
where
(completed_at is not null)
and completions.is_mongo_synced = 0
and completions.id > 2000000
order by completions.id asc limit 10;
85ms
Query B:
select completions.*
from completions
force index(index_completions_on_completed_at_and_is_mongo_synced_and_id)
where
(completed_at is not null)
and completions.is_mongo_synced = 0
and completions.id > 2000000
order by completions.id asc limit 10;
1,31s
Query A:
select completions.*
from completions
where
(completed_at is not null)
and completions.is_mongo_synced = 0
and completions.id > 2000000
order by completions.id asc limit 10;
85ms
Query B:
select completions.*
from completions
where
(completed_at is not null)
and completions.is_mongo_synced = 0
and completions.id > 1600000
order by completions.id asc limit 10;
13s
Index:
key index_completions_on_completed_at_and_is_mongo_synced_and_id (completed_at,is_mongo_synced,id),
Query:
select completions.*
from completions
force index(index_completions_on_completed_at_and_is_mongo_synced_and_id)
where
(completed_at is not null)
and completions.is_mongo_synced = 0
and completions.id > 2000000
order by completions.id asc limit 10;
Some more data requested in the comments
Num of rows based onis_mongo_synced
values
select
completions.is_mongo_synced,
count(*)
from completions
group by completions.is_mongo_synced;
Result:
[
{
"is_mongo_synced":0,
"count(*)":2731921
},
{
"is_mongo_synced":1,
"count(*)":2087869
}
]
Queries without order by
select completions.*
from completions
where
(completed_at is not null)
and completions.is_mongo_synced = 0
and completions.id > 2000000
limit 10;
544ms
select completions.*
from completions
force index(index_completions_on_completed_at_and_is_mongo_synced_and_id)
where
(completed_at is not null)
and completions.is_mongo_synced = 0
and completions.id > 2000000
limit 10;
314ms
But, anyhow, I need the order because I'm scanning the table batch by batch.
If there is a choice between multiple indexes, MySQL normally uses the index that finds the smallest number of rows (the most selective index). If the table has a multiple-column index, any leftmost prefix of the index can be used by the optimizer to look up rows.
Choose columns for the primary index that distribute table rows evenly across the AMPs. The more singular the values for a column, the more optimal their distribution. Choose as few columns as possible for the primary index to optimize its generality.
In some cases, it also makes sense to create a separate index that contains the ORDER BY clause's columns, as MySQL sometimes chooses to use it. Please note though that for this to happen, the index should contain all columns from the ORDER BY clause and they should all be specified with the same order (ASC / DESC).
MySQL can only use an index for searches up to the first range. Removing the range requirement on log_type should improve performance, but you may get mixed results by adding it in later in an outer query. If you do this, you'll also have to remove log_type from the covering index.
MySQL uses indexes to quickly find rows with specific column values. Without an index, MySQL must scan the whole table to locate the relevant rows. The larger table, the slower it searches. In this section, you will learn about MySQL index including creating indexes, removing indexes, listing all indexes of a table and other important features ...
More information and a pseudo-algorithm can be found on MySQL's documentation. If you're looking for a way to automate your index creation, while also adding the benefit of a proprietary indexing algorithm and query optimization recommendations, you can try out EverSQL Index Advisor which does all the heavy lifting for you.
So, we decided to write about it. The first option is to use EverSQL to automatically find indexes that are best for your database. The second option is to read our detailed tutorial below and learn more about indexing best practices.
Because you can't create index for views in mysql like in oracle. But in some test that I took, indexes can be used in view select statement. Maybe I've created those views in a wrong way. I'll use a example to describe my problem. We have a table that records data for high scores in NBA games, with index on column [happend_in]
Your questions are quite complicated. But, your for your first query:
select completions.*
from completions
where completed_at is not null and
completions.is_mongo_synced = 0
order by completions.id asc
limit 10;
The best index in on (is_mongo_synced, completed_at)
. There might be other ways to write the query, but in the index you are forcing, the columns are not in an optimal order.
The difference in performance in your second query is probably because the data is actually being sorted. A few extra hundreds of thousands of rows can affect the sort time. The dependence on the value of id
is probably way the index is not used. If you changed the index to (is_mongo_synced, id, completed_at)
, then index usage would be more likely.
MySQL has good documentation on composite indexes. You might want to review it here.
After adding the index:
KEY `index_completions_on_is_mongo_synced_and_id_and_completed_at` (`is_mongo_synced`,`id`,`completed_at`) USING BTREE,
And executing the long query again
select completions.*
from completions
where
(completed_at is not null)
and completions.is_mongo_synced = 0
order by completions.id asc limit 10;
It takes 156ms, which is very good.
Checking the explain extended
we see MySQL is using the correct index:
id: 1
select_type: SIMPLE
table: completions
type: ref
possible_keys: index_completions_on_completed_at_and_is_mongo_synced_and_id,index_completions_on_is_mongo_synced_and_id_and_completed_at
key: index_completions_on_is_mongo_synced_and_id_and_completed_at
key_len: 2
ref: const
rows: 1626322
filtered: 100
Extra: Using index condition; Using where
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With