Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does MySQL use collations with indexes?

Tags:

indexing

mysql

I'm wondering if MySQL takes collation into account when generating an index, or if the index is generated the same regardless of collation, the collation only being taken into account when later traversing that index.

For my purposes, I'd like to use the collation utf8_unicode_ci on a field. I know this particular collation has a relatively high performance penalty, but it's still important to me to use it.

I have an index on that field which is being used to satisfy an ORDER BY clause, retrieving the rows in order quickly (avoiding a filesort). However, I'm not sure whether using this collation is going to affect the speed of rows as they are read back from the index, or if the index stores data in an already-normalised state according to that collation, allowing for the performance penalty to be entirely in generating the index and not reading it back.

like image 812
thomasrutter Avatar asked Mar 12 '09 02:03

thomasrutter


People also ask

How does MySQL indexing work internally?

Indexes are used to find rows with specific column values quickly. Without an index, MySQL must begin with the first row and then read through the entire table to find the relevant rows. The larger the table, the more this costs.

What is collation in MySQL why it is used?

A collation is a set of rules that defines how to compare and sort character strings. Each collation in MySQL belongs to a single character set. Every character set has at least one collation, and most have two or more collations. A collation orders characters based on weights.

Does adding an index lock a table?

Yes you can. It will lock the table you're adding an index to while it's being created. If the table is large, it may take awhile as it has to read each row while building the index.

Why MySQL does not pick correct index for few queries?

MySQL can only use an index for searches up to the first range. Removing the range requirement on log_type should improve performance, but you may get mixed results by adding it in later in an outer query. If you do this, you'll also have to remove log_type from the covering index.


1 Answers

I believe that the btree structure will be different because it has to compare the column values differently.

Look at these two query plans:

mysql> explain select * from sometable where keycol = '3';
+----+-------------+-------+------+---------------+---------+---------+-------+------+--------------------------+
| id | select_type | table | type | possible_keys | key     | key_len | ref   | rows | Extra                    |
+----+-------------+-------+------+---------------+---------+---------+-------+------+--------------------------+
|  1 | SIMPLE      | pro   | ref  | PRIMARY       | PRIMARY | 66      | const |   34 | Using where; Using index | 
+----+-------------+-------+------+---------------+---------+---------+-------+------+--------------------------+


mysql> explain select * from sometable where binary keycol = '3';
+----+-------------+-------+-------+---------------+---------+---------+------+-------+--------------------------+
| id | select_type | table | type  | possible_keys | key     | key_len | ref  | rows  | Extra                    |
+----+-------------+-------+-------+---------------+---------+---------+------+-------+--------------------------+
|  1 | SIMPLE      | pro   | index | NULL          | PRIMARY | 132     | NULL | 14417 | Using where; Using index | 
+----+-------------+-------+-------+---------------+---------+---------+------+-------+--------------------------+

If we change the collation for the comparison, suddenly it isn't even able to seek the index anymore and has to scan every row. The actual values stored in the index will be the same regardless of collation, for instance, because it will still return the value in its original casing regardless of whether it's using a case sensitive or case insensitive collation.

So lookups against a case insensitive collation should be a little less efficient.

However, I doubt you'd ever be able to notice the difference; note that MySQL makes everything case insensitive by default, so the impact can't be that terrible.

UPDATE:

You can see a similar effect for order by operations:

mysql> explain select * from sometable order by keycol collate latin1_general_cs;
+----+-------------+-------+-------+---------------+---------+---------+------+-------+-----------------------------+
| id | select_type | table | type  | possible_keys | key     | key_len | ref  | rows  | Extra                       |
+----+-------------+-------+-------+---------------+---------+---------+------+-------+-----------------------------+
|  1 | SIMPLE      | pro   | index | NULL          | PRIMARY | 132     | NULL | 14417 | Using index; Using filesort | 
+----+-------------+-------+-------+---------------+---------+---------+------+-------+-----------------------------+

mysql> explain select * from sometable order by keycol ;
+----+-------------+-------+-------+---------------+---------+---------+------+-------+-------------+
| id | select_type | table | type  | possible_keys | key     | key_len | ref  | rows  | Extra       |
+----+-------------+-------+-------+---------------+---------+---------+------+-------+-------------+
|  1 | SIMPLE      | pro   | index | NULL          | PRIMARY | 132     | NULL | 14417 | Using index | 
+----+-------------+-------+-------+---------------+---------+---------+------+-------+-------------+

Note the extra 'filesort' stage required to execute the query. That means mysql is queuing up the result in a temporary buffer and sorting it itself using a quicksort in an extra stage, throwing out whatever the index order was. Using the original collation this step is uneccessary as mysql knows the order from index initially.

like image 55
ʞɔıu Avatar answered Sep 20 '22 23:09

ʞɔıu