Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does Index Scope work in Mysql?

In the MySQL manual there is a page on index hinting that mentions that you can specify the index hinting for specific parts of the query.

You can specify the scope of an index hint by adding a FOR clause to the hint. This provides more fine-grained control over the optimizer's selection of an execution plan for various phases of query processing. To affect only the indexes used when MySQL decides how to find rows in the table and how to process joins, use FOR JOIN. To influence index usage for sorting or grouping rows, use FOR ORDER BY or FOR GROUP BY.

However, there is little to no more information about how this works or what it actually does in the MySQL optimizer. As well in practice it appears to be negligible in actually improving anything.

Here is a test query and what explain says about the query:

SELECT 
    `property`.`primary_id` AS `id` 
FROM `California` `property`

USE INDEX FOR JOIN (`Zipcode Bedrooms`)
USE INDEX FOR ORDER BY (`Zipcode Bathrooms`)

INNER JOIN `application_zipcodes` `az`
    ON `az`.`application_id` = '18'
    AND `az`.`zipcode` = `property`.`zipcode`

WHERE `property`.`city` = 'San Jose'
AND `property.`zipcode` = '95133'
AND `property`.property_type` = 'Residential'
AND `property`.`style` = 'Condominium'
AND `property`.`bedrooms` = '3'
ORDER BY `property`.`bathrooms` ASC
LIMIT 15
;

Explain:

EXPLAIN SELECT `property`.`primary_id` AS `id` FROM `California` `property` USE INDEX FOR JOIN (`Zipcode Bedrooms`) USE INDEX FOR ORDER BY (`Zipcode Bathrooms`) INNER JOIN `application_zipcodes` `az` ON `az`.`application_id` = '18' AND `az`.`zipcode` = `property`.`zipcode` WHERE `property`.`city` = 'San Jose' AND `property.`zipcode` = '95133' AND `property`.property_type` = 'Residential' AND `property`.`style` = 'Condominium' AND `property`.`bedrooms` = '3' ORDER BY `property`.`bathrooms` ASC LIMIT 15\g
+------+-------------+----------+--------+---------------+---------+---------+------------------------------------+------+----------------------------------------------------+
| id   | select_type | table    | type   | possible_keys | key     | key_len | ref                                | rows | Extra                                              |
+------+-------------+----------+--------+---------------+---------+---------+------------------------------------+------+----------------------------------------------------+
|    1 | SIMPLE      | Property | ref    | Zip Bed       | Zip Bed | 17      | const,const                        | 2364 | Using index condition; Using where; Using filesort |
|    1 | SIMPLE      | az       | eq_ref | PRIMARY       | PRIMARY | 7       | const,Property.zipcode             |    1 | Using where; Using index                           |
+------+-------------+----------+--------+---------------+---------+---------+------------------------------------+------+----------------------------------------------------+
2 rows in set (0.01 sec)

So to summarize I am basically wondering how the index scope was meant to be used, as this doesn't seem to do anything when I add or remove the line USE INDEX FOR ORDER BY (Zipcode Bathrooms).

like image 436
DevinMcBeth Avatar asked Oct 08 '15 19:10

DevinMcBeth


1 Answers

I have yet to figure out how multiple hints can be used. MySQL will almost never use more than one index per SELECT. The only exception I know of is with "index merge", which is not relevant in your example.

The Optimizer usually focuses on finding a good index for the WHERE clause. If it entirely covers the WHERE, without any "ranges", then it checks to see if there are GROUP BY and ORDER BY fields, in the right order, to use. If it can handle all of WHERE, GROUP BY, and ORDER BY, then it can actually optimize the LIMIT (but not OFFSET).

If the Optimizer can't consume all the WHERE, it may reach into the ORDER BY in hopes avoiding the "filesort" that ORDER BY otherwise requires.

None of this allows for different indexes for different clauses. A single hint may encourage the use of one of the above cases (above) in preference to the other; I don't know.

Don't use utf8 for zipcode; it makes things bulkier than necessary (3 bytes per character). In general, shrinking the size of the table will help performance some. Or, if you have a huge dataset, it may help perf a lot. (Avoiding I/O is very important.)

Bathrooms is not very selective; there is not much to gain even if it would be possible.

az.application_id is the big monkey wrench in the query; what is it?

like image 86
Rick James Avatar answered Nov 15 '22 00:11

Rick James