Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cannot achieve a cover index with this table (2 equalities and one selection)?

CREATE TABLE `discount_base` (
  `id` varchar(12) COLLATE utf8_unicode_ci NOT NULL,
  `amount` decimal(13,4) NOT NULL,
  `description` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
  `family` varchar(4) COLLATE utf8_unicode_ci NOT NULL,
  `customer_id` varchar(8) COLLATE utf8_unicode_ci NOT NULL,
  PRIMARY KEY (`id`),
  KEY `IDX_CUSTOMER` (`customer_id`),
  KEY `IDX_FAMILY_CUSTOMER_AMOUNT` (`family`,`customer_id`,`amount`),
  CONSTRAINT `FK_CUSTOMER` FOREIGN KEY (`customer_id`) 
      REFERENCES `customer` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

I've added a cover index IDX_FAMILY_CUSTOMER_AMOUNT on family, customer_id and amount because most of the time I use the following query:

SELECT amount FROM discount_base WHERE family = :family AND customer_id = :customer_id

However using EXPLAIN and a bounce of records (~ 250000) it says:

'1', 'SIMPLE', 'discount_base', 'ref', 'IDX_CUSTOMER,IDX_FAMILY_CUSTOMER_AMOUNT', 'IDX_FAMILY_CUSTOMER_AMOUNT', '40', 'const,const', '1', 'Using where; Using index'

Why I'm getting using where; using index instead of just using index?

EDIT: Fiddle with a small amount of data (Using where; Using index):

EXPLAIN SELECT amount
FROM discount_base
WHERE family = '0603' and customer_id = '20000275';

Another fiddle where id is family + customer_id (const):

EXPLAIN SELECT amount
FROM discount_base
WHERE `id` = '060320000275';
like image 526
gremo Avatar asked May 18 '15 12:05

gremo


5 Answers

Interesting problem. It would seem "obvious" that the IDX_FAMILY_CUSTOMER_AMOUNT index would be used for this query:

SELECT amount
FROM discount_base
WHERE family = :family AND customer_id = :customer_id;

"Obvious" to us people, but clearly not to the optimizer. What is happening?

This aspect of index usage is poorly documented. I (intelligently) speculate that when doing comparisons on strings using case-insensitive collations (and perhaps others), then the = operation is really more like an in. Something sort of like this, conceptually:

WHERE family in (lower(:family, upper(:family), . . .) and . . . 

This is conceptual. But it means that an index scan is required for the = rather than an index lookup. Minor change typographically. Very important semantically. It prevents the use of the second key. Yup, that is an unfortunately consequence of inequalities, even when they look like =.

So, the optimizer compares the two possible indexes, and it decides that customer_id is more selective than family, and chooses the former.

Alas, both of your keys are case-insensitive strings. My suggestion would be to replace at least one of them with an auto-incrementing integer id. In fact, my suggestion is that basically all tables have an auto-incrementing integer id, which is then used for all foreign key references.

Another solution would be to use a trigger to create a single column CustomerFamily with the values concatenated together. Then this index:

KEY IDX_CUSTOMERFAMILY_AMOUNT (CustomerFamily, amount)

should do what you want. It is also possible that a case-sensitive encoding would also solve the problem.

like image 122
Gordon Linoff Avatar answered Nov 15 '22 21:11

Gordon Linoff


Are family and customer_id strings? I guess you could be passing customer_id maybe as a integer which could be causing a type conversion to take place and so the index not being used for that particular column.

Ensure you pass customer_id as string or consider changing your table to store cusomer_id as INT.

If you are using alphanumeric Ids then this don't apply.

like image 41
Juan Avatar answered Nov 15 '22 20:11

Juan


I'm pretty sure Using index is the important part, and it means "using a covering index".

Two things to further check:

EXPLAIN FORMT=JSON SELECT ...

may give further clues.

FLUSH STATUS;
SELECT ...;
SHOW SESSION STATUS LIKE 'Handler%';

will show you how many rows were read/written/etc in various ways. If some number says about 250000 (in your case), it indicates a table scan. If all the numbers a small (approximately the number of rows returned by the query), then you can be assured that it did do what that query efficiently.

The numbers there do not distinguish between read to an index versus data. But they ignore caching. Timings (for two identical runs) can differ significantly due to caching; Handler% values won't change.

like image 32
Rick James Avatar answered Nov 15 '22 21:11

Rick James


The answer to your question relies on what the engine is actually using your index for.

In given query, you ask the engine to:

  1. Lookup for values (WHERE/JOIN)
  2. Retrieve information (SELECT) based on this lookup result

For the first part, as soon as you filter the results (lookup), there's an entry in Extra indicating USING WHERE, so this is the reason you see it in your explain plan.

For the second part, the engine does not need to go anywhere out of one given index because it is a covering index. The explain plan notifies it by showing USING INDEX. This USING INDEX hint, combined with USING WHERE, means your index is also used in the lookup portion of the query, as explained in mysql documentation:

https://dev.mysql.com/doc/refman/5.0/en/explain-output.html

Using index

The column information is retrieved from the table using only information in the index tree without having to do an additional seek to read the actual row. This strategy can be used when the query uses only columns that are part of a single index.

If the Extra column also says Using where, it means the index is being used to perform lookups of key values. Without Using where, the optimizer may be reading the index to avoid reading data rows but not using it for lookups. For example, if the index is a covering index for the query, the optimizer may scan it without using it for lookups.

Check this fiddle:

http://sqlfiddle.com/#!9/8cdf2/10

I removed the where clause and the query now displays USING INDEX only. This is because no lookup is necessary in your table.

like image 36
Sebas Avatar answered Nov 15 '22 20:11

Sebas


The MySQL documentation on EXPLAIN has this to say:

Using index

The column information is retrieved from the table using only information in the index tree without having to do an additional seek to read the actual row. This strategy can be used when the query uses only columns that are part of a single index.

If the Extra column also says Using where, it means the index is being used to perform lookups of key values. Without Using where, the optimizer may be reading the index to avoid reading data rows but not using it for lookups. For example, if the index is a covering index for the query, the optimizer may scan it without using it for lookups.

My best guess, based on the information you have provided, is that the optimizer first uses your IDX_CUSTOMER index and then performs a key lookup to retrieve non-key data (amount and family) from the actual data page based on the key (customer_id). This is most likely caused by cardinality (eg. uniqueness) of the columns in your indexes. You should check the cardinality of the columns used in your where clause and put the one with the highest cardinality first on your index. Guessing from the column names and your current results, customer_id has the highest cardinality.

So change this:

KEY `IDX_FAMILY_CUSTOMER_AMOUNT` (`family`,`customer_id`,`amount`)

to this:

KEY `IDX_FAMILY_CUSTOMER_AMOUNT` (`customer_id`,`family`,`amount`)

After making the change, you should run ANALYZE TABLE to update table statistics. This will update table statistics, which can affect the choices the optimizer makes regarding your indexes.

like image 33
prudentcoder Avatar answered Nov 15 '22 20:11

prudentcoder