I have this table (simplified version) <pre class="prettyprint"><code>create table completions ( id int(11) not null auto_increment, completed_at datetime default null, is_mongo_synced tinyint(1) default '0', primary key (id), key index_completions_on_completed_at_and_is_mongo_synced_and_id (completed_at,is_mongo_synced,id), ) engine=innodb auto_increment=4785424 default charset=utf8 collate=utf8_unicode_ci; </code></pre> Size: <pre class="prettyprint"><code>select count(*) from completions; -- => 4817574 </code></pre> Now I try to execute this query: <pre class="prettyprint"><code>select completions.* from completions where (completed_at is not null) and completions.is_mongo_synced = 0 order by completions.id asc limit 10; </code></pre> And it takes 9mins. I see there is not any index used, the <code>explain extend</code> returns this: <pre class="prettyprint"><code>id: 1 select_type: SIMPLE table: completions type: index possible_keys: index_completions_on_completed_at_and_is_mongo_synced_and_id key: PRIMARY key_len: 4 ref: NULL rows: 20 filtered: 11616415.00 Extra: Using where </code></pre> If I force the index: <pre class="prettyprint"><code>select completions.* from completions force index(index_completions_on_completed_at_and_is_mongo_synced_and_id) where (completed_at is not null) and completions.is_mongo_synced = 0 order by completions.id asc limit 10; </code></pre> It takes 1,22s, which is much better. The <code>explain extend</code> returns: <pre class="prettyprint"><code>id: 1 select_type: SIMPLE table: completions type: range possible_keys: index_completions_on_completed_at_and_is_mongo_synced_and_id key: index_completions_on_completed_at_and_is_mongo_synced_and_id key_len: 6 ref: null rows: 2323334 filtered: 100 Extra: Using index condition; Using filesort </code></pre> Now if I narrow the query by <code>completions.id</code> like: <pre class="prettyprint"><code>select completions.* from completions force index(index_completions_on_completed_at_and_is_mongo_synced_and_id) where (completed_at is not null) and completions.is_mongo_synced = 0 and completions.id > 2000000 order by completions.id asc limit 10; </code></pre> It takes 1,31s, still good. The <code>explain extend</code> returns: <pre class="prettyprint"><code>id: 1 select_type: SIMPLE table: completions type: range possible_keys: index_completions_on_completed_at_and_is_mongo_synced_and_id key: index_completions_on_completed_at_and_is_mongo_synced_and_id key_len: 6 ref: null rows: 2323407 filtered: 100 Extra: Using index condition; Using filesort </code></pre> The point is that if for the last query I don't force the index: <pre class="prettyprint"><code>select completions.* from completions where (completed_at is not null) and completions.is_mongo_synced = 0 and completions.id > 2000000 order by completions.id asc limit 10; </code></pre> It takes 85ms, check that it is ms and not s. The <code>explain extend</code> returns: <pre class="prettyprint"><code>id: 1 select_type: SIMPLE table: completions type: range possible_keys: PRIMARYindex_completions_on_completed_at_and_is_mongo_synced_and_id key: PRIMARY key_len: 4 ref: null rows: 2323451 filtered: 100 Extra: Using where </code></pre> Not only this is making me nuts but also the fact that the performance of the last query is highly affected for small changes in the number of the filter: <pre class="prettyprint"><code>select completions.* from completions where (completed_at is not null) and completions.is_mongo_synced = 0 and completions.id > 1600000 order by completions.id asc limit 10; </code></pre> It takes 13s Things I don't understand: <ol> <li>Why this the below query A is faster than query B when query B suppose to use a more precise index: c</li> </ol> Query A: <pre class="prettyprint"><code>select completions.* from completions where (completed_at is not null) and completions.is_mongo_synced = 0 and completions.id > 2000000 order by completions.id asc limit 10; </code></pre> 85ms Query B: <pre class="prettyprint"><code>select completions.* from completions force index(index_completions_on_completed_at_and_is_mongo_synced_and_id) where (completed_at is not null) and completions.is_mongo_synced = 0 and completions.id > 2000000 order by completions.id asc limit 10; </code></pre> 1,31s <h3>2. Why such a difference in performan among the below queries:</h3> Query A: <pre class="prettyprint"><code>select completions.* from completions where (completed_at is not null) and completions.is_mongo_synced = 0 and completions.id > 2000000 order by completions.id asc limit 10; </code></pre> 85ms Query B: <pre class="prettyprint"><code>select completions.* from completions where (completed_at is not null) and completions.is_mongo_synced = 0 and completions.id > 1600000 order by completions.id asc limit 10; </code></pre> 13s <h3>3. Why MySQL is not using automatically the index for the below query:</h3> Index: <pre class="prettyprint"><code>key index_completions_on_completed_at_and_is_mongo_synced_and_id (completed_at,is_mongo_synced,id), </code></pre> Query: <pre class="prettyprint"><code>select completions.* from completions force index(index_completions_on_completed_at_and_is_mongo_synced_and_id) where (completed_at is not null) and completions.is_mongo_synced = 0 and completions.id > 2000000 order by completions.id asc limit 10; </code></pre> <h3>Update</h3> Some more data requested in the comments Num of rows based on <code>is_mongo_synced</code> values <pre class="prettyprint"><code> select completions.is_mongo_synced, count(*) from completions group by completions.is_mongo_synced; </code></pre> Result: <pre class="prettyprint"><code>[ { "is_mongo_synced":0, "count(*)":2731921 }, { "is_mongo_synced":1, "count(*)":2087869 } ] </code></pre> Queries without <code>order by</code> <pre class="prettyprint"><code>select completions.* from completions where (completed_at is not null) and completions.is_mongo_synced = 0 and completions.id > 2000000 limit 10; </code></pre> 544ms <pre class="prettyprint"><code>select completions.* from completions force index(index_completions_on_completed_at_and_is_mongo_synced_and_id) where (completed_at is not null) and completions.is_mongo_synced = 0 and completions.id > 2000000 limit 10; </code></pre> 314ms But, anyhow, I need the order because I'm scanning the table batch by batch.

Your questions are quite complicated. But, your for your first query: <pre class="prettyprint"><code>select completions.* from completions where completed_at is not null and completions.is_mongo_synced = 0 order by completions.id asc limit 10; </code></pre> The best index in on <code>(is_mongo_synced, completed_at)</code>. There might be other ways to write the query, but in the index you are forcing, the columns are not in an optimal order. The difference in performance in your second query is probably because the data is actually being sorted. A few extra hundreds of thousands of rows can affect the sort time. The dependence on the value of <code>id</code> is probably way the index is not used. If you changed the index to <code>(is_mongo_synced, id, completed_at)</code>, then index usage would be more likely. MySQL has good documentation on composite indexes. You might want to review it here. <h3>After adding the suggested filter</h3> After adding the index: <pre class="prettyprint"><code>KEY `index_completions_on_is_mongo_synced_and_id_and_completed_at` (`is_mongo_synced`,`id`,`completed_at`) USING BTREE, </code></pre> And executing the long query again <pre class="prettyprint"><code>select completions.* from completions where (completed_at is not null) and completions.is_mongo_synced = 0 order by completions.id asc limit 10; </code></pre> It takes 156ms, which is very good. Checking the <code>explain extended</code> we see MySQL is using the correct index: <pre class="prettyprint"><code>id: 1 select_type: SIMPLE table: completions type: ref possible_keys: index_completions_on_completed_at_and_is_mongo_synced_and_id,index_completions_on_is_mongo_synced_and_id_and_completed_at key: index_completions_on_is_mongo_synced_and_id_and_completed_at key_len: 2 ref: const rows: 1626322 filtered: 100 Extra: Using index condition; Using where </code></pre>

MySQL looking for a nice index

Tags:

sql

select

indexing

mysql

I have this table (simplified version)

create table completions (
  id int(11) not null auto_increment,
  completed_at datetime default null,
  is_mongo_synced tinyint(1) default '0',
  primary key (id),
  key index_completions_on_completed_at_and_is_mongo_synced_and_id (completed_at,is_mongo_synced,id),
) engine=innodb auto_increment=4785424 default charset=utf8 collate=utf8_unicode_ci;

Size:

select count(*) from completions; -- => 4817574

Now I try to execute this query:

select completions.* 
from completions  
where 
  (completed_at is not null) 
  and completions.is_mongo_synced = 0 
  order by completions.id asc limit 10;

And it takes 9mins.

I see there is not any index used, the explain extend returns this:

id: 1 
select_type: SIMPLE
table: completions 
type: index 
possible_keys: index_completions_on_completed_at_and_is_mongo_synced_and_id  
key: PRIMARY 
key_len: 4 
ref: NULL  
rows: 20  
filtered: 11616415.00 
Extra: Using where

If I force the index:

select completions.* 
from completions  
force index(index_completions_on_completed_at_and_is_mongo_synced_and_id)
where 
  (completed_at is not null) 
  and completions.is_mongo_synced = 0 
  order by completions.id asc limit 10;

It takes 1,22s, which is much better. The explain extend returns:

id: 1
select_type: SIMPLE
table: completions
type: range
possible_keys: index_completions_on_completed_at_and_is_mongo_synced_and_id
key: index_completions_on_completed_at_and_is_mongo_synced_and_id
key_len: 6
ref: null
rows: 2323334
filtered: 100
Extra: Using index condition; Using filesort

Now if I narrow the query by completions.id like:

select completions.* 
from completions  
force index(index_completions_on_completed_at_and_is_mongo_synced_and_id)
where 
  (completed_at is not null) 
  and completions.is_mongo_synced = 0 
  and completions.id > 2000000
  order by completions.id asc limit 10;

It takes 1,31s, still good. The explain extend returns:

id: 1
select_type: SIMPLE
table: completions
type: range
possible_keys: index_completions_on_completed_at_and_is_mongo_synced_and_id
key: index_completions_on_completed_at_and_is_mongo_synced_and_id
key_len: 6
ref: null
rows: 2323407
filtered: 100
Extra: Using index condition; Using filesort

The point is that if for the last query I don't force the index:

select completions.* 
from completions  
where 
  (completed_at is not null) 
  and completions.is_mongo_synced = 0 
  and completions.id > 2000000
  order by completions.id asc limit 10;

It takes 85ms, check that it is ms and not s. The explain extend returns:

id: 1
select_type: SIMPLE
table: completions
type: range
possible_keys: PRIMARYindex_completions_on_completed_at_and_is_mongo_synced_and_id
key: PRIMARY
key_len: 4
ref: null
rows: 2323451
filtered: 100
Extra: Using where

Not only this is making me nuts but also the fact that the performance of the last query is highly affected for small changes in the number of the filter:

select completions.* 
from completions  
where 
  (completed_at is not null) 
  and completions.is_mongo_synced = 0 
  and completions.id > 1600000
  order by completions.id asc limit 10;

It takes 13s

Things I don't understand:

Why this the below query A is faster than query B when query B suppose to use a more precise index: c

Query A:

select completions.* 
from completions  
where 
  (completed_at is not null) 
  and completions.is_mongo_synced = 0 
  and completions.id > 2000000
  order by completions.id asc limit 10;

85ms

Query B:

select completions.* 
from completions  
force index(index_completions_on_completed_at_and_is_mongo_synced_and_id)
where 
  (completed_at is not null) 
  and completions.is_mongo_synced = 0 
  and completions.id > 2000000
  order by completions.id asc limit 10;

1,31s

2. Why such a difference in performan among the below queries:

Query A:

select completions.* 
from completions  
where 
  (completed_at is not null) 
  and completions.is_mongo_synced = 0 
  and completions.id > 2000000
  order by completions.id asc limit 10;

85ms

Query B:

select completions.* 
from completions  
where 
  (completed_at is not null) 
  and completions.is_mongo_synced = 0 
  and completions.id > 1600000
  order by completions.id asc limit 10;

13s

3. Why MySQL is not using automatically the index for the below query:

Index:

key index_completions_on_completed_at_and_is_mongo_synced_and_id (completed_at,is_mongo_synced,id),

Query:

select completions.* 
from completions  
force index(index_completions_on_completed_at_and_is_mongo_synced_and_id)
where 
  (completed_at is not null) 
  and completions.is_mongo_synced = 0 
  and completions.id > 2000000
  order by completions.id asc limit 10;

Update

Some more data requested in the comments

Num of rows based on is_mongo_synced values

 select
     completions.is_mongo_synced,
     count(*)
 from completions
 group by completions.is_mongo_synced;

Result:

[
  {
    "is_mongo_synced":0,
    "count(*)":2731921
  },
  {
    "is_mongo_synced":1,
    "count(*)":2087869
  }
]

Queries without order by

select completions.* 
from completions  
where 
  (completed_at is not null) 
  and completions.is_mongo_synced = 0 
  and completions.id > 2000000
  limit 10;

544ms

select completions.* 
from completions  
force index(index_completions_on_completed_at_and_is_mongo_synced_and_id)
where 
  (completed_at is not null) 
  and completions.is_mongo_synced = 0 
  and completions.id > 2000000
  limit 10;

314ms

But, anyhow, I need the order because I'm scanning the table batch by batch.

964

asked Dec 10 '15 12:12

fguillen

Video Answer

1 Answers

Your questions are quite complicated. But, your for your first query:

select completions.* 
from completions  
where completed_at is not null and
      completions.is_mongo_synced = 0 
order by completions.id asc
limit 10;

The best index in on (is_mongo_synced, completed_at). There might be other ways to write the query, but in the index you are forcing, the columns are not in an optimal order.

The difference in performance in your second query is probably because the data is actually being sorted. A few extra hundreds of thousands of rows can affect the sort time. The dependence on the value of id is probably way the index is not used. If you changed the index to (is_mongo_synced, id, completed_at), then index usage would be more likely.

MySQL has good documentation on composite indexes. You might want to review it here.

After adding the suggested filter

After adding the index:

KEY `index_completions_on_is_mongo_synced_and_id_and_completed_at` (`is_mongo_synced`,`id`,`completed_at`) USING BTREE,

And executing the long query again

select completions.* 
from completions  
where 
  (completed_at is not null) 
  and completions.is_mongo_synced = 0 
  order by completions.id asc limit 10;

It takes 156ms, which is very good.

Checking the explain extended we see MySQL is using the correct index:

id: 1
select_type: SIMPLE
table: completions
type: ref
possible_keys: index_completions_on_completed_at_and_is_mongo_synced_and_id,index_completions_on_is_mongo_synced_and_id_and_completed_at
key: index_completions_on_is_mongo_synced_and_id_and_completed_at
key_len: 2
ref: const
rows: 1626322
filtered: 100
Extra: Using index condition; Using where

193

answered Oct 06 '22 17:10

Gordon Linoff

Related questions
                            
                                Cannot add DATETIME columns to MySQL table
                            
                                symfony2/doctrine2 dql where field is not true
                            
                                An injection attack that succeeds with mysql_query, but fails with mysqli_query
                            
                                Error Number: 1267 Illegal mix of collations (latin1_swedish_ci,IMPLICIT) and (utf8_general_ci,COERCIBLE) for operation '='
                            
                                Multiple INSERT in one query using MySqlCommand
                            
                                How to connect node.js to mysql and wamp/xampp server?
                            
                                MySQL Insert row with column value same as Auto Increment
                            
                                Table locking issues with Laravel 5.1
                            
                                Laravel 5 timestamp not update right time
                            
                                How to run mySQL function to update all rows?
                            
                                Sequel pro - Importing CSV file: Encoding error
                            
                                What is a better way to write this SQL query?
                            
                                Excluding MYSQL query results with an INNER JOIN
                            
                                How to save uploaded file name in table using Laravel 5.1
                            
                                Print out a SQL single query (Yii 1.x)
                            
                                Flask, not all arguments converted during string formatting
                            
                                mysql date format regarding to a specific country
                            
                                rails sidekiq cannot find the created record
                            
                                Set multiple fields in prepared queries in NodeJS
                            
                                Environment specific SSL config in Laravel .env file

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With