I'm wondering if MySQL takes collation into account when generating an index, or if the index is generated the same regardless of collation, the collation only being taken into account when later traversing that index. For my purposes, I'd like to use the collation utf8_unicode_ci on a field. I know this particular collation has a relatively high performance penalty, but it's still important to me to use it. I have an index on that field which is being used to satisfy an ORDER BY clause, retrieving the rows in order quickly (avoiding a filesort). However, I'm not sure whether using this collation is going to affect the speed of rows as they are read back from the index, or if the index stores data in an already-normalised state according to that collation, allowing for the performance penalty to be entirely in generating the index and not reading it back.

How does MySQL use collations with indexes?

Tags:

indexing

mysql

I'm wondering if MySQL takes collation into account when generating an index, or if the index is generated the same regardless of collation, the collation only being taken into account when later traversing that index.

For my purposes, I'd like to use the collation utf8_unicode_ci on a field. I know this particular collation has a relatively high performance penalty, but it's still important to me to use it.

I have an index on that field which is being used to satisfy an ORDER BY clause, retrieving the rows in order quickly (avoiding a filesort). However, I'm not sure whether using this collation is going to affect the speed of rows as they are read back from the index, or if the index stores data in an already-normalised state according to that collation, allowing for the performance penalty to be entirely in generating the index and not reading it back.

812

asked Mar 12 '09 02:03

thomasrutter

1 Answers

I believe that the btree structure will be different because it has to compare the column values differently.

Look at these two query plans:

Click to copy

mysql> explain select * from sometable where keycol = '3';
+----+-------------+-------+------+---------------+---------+---------+-------+------+--------------------------+
| id | select_type | table | type | possible_keys | key     | key_len | ref   | rows | Extra                    |
+----+-------------+-------+------+---------------+---------+---------+-------+------+--------------------------+
|  1 | SIMPLE      | pro   | ref  | PRIMARY       | PRIMARY | 66      | const |   34 | Using where; Using index | 
+----+-------------+-------+------+---------------+---------+---------+-------+------+--------------------------+


mysql> explain select * from sometable where binary keycol = '3';
+----+-------------+-------+-------+---------------+---------+---------+------+-------+--------------------------+
| id | select_type | table | type  | possible_keys | key     | key_len | ref  | rows  | Extra                    |
+----+-------------+-------+-------+---------------+---------+---------+------+-------+--------------------------+
|  1 | SIMPLE      | pro   | index | NULL          | PRIMARY | 132     | NULL | 14417 | Using where; Using index | 
+----+-------------+-------+-------+---------------+---------+---------+------+-------+--------------------------+

If we change the collation for the comparison, suddenly it isn't even able to seek the index anymore and has to scan every row. The actual values stored in the index will be the same regardless of collation, for instance, because it will still return the value in its original casing regardless of whether it's using a case sensitive or case insensitive collation.

So lookups against a case insensitive collation should be a little less efficient.

However, I doubt you'd ever be able to notice the difference; note that MySQL makes everything case insensitive by default, so the impact can't be that terrible.

UPDATE:

You can see a similar effect for order by operations:

Click to copy

mysql> explain select * from sometable order by keycol collate latin1_general_cs;
+----+-------------+-------+-------+---------------+---------+---------+------+-------+-----------------------------+
| id | select_type | table | type  | possible_keys | key     | key_len | ref  | rows  | Extra                       |
+----+-------------+-------+-------+---------------+---------+---------+------+-------+-----------------------------+
|  1 | SIMPLE      | pro   | index | NULL          | PRIMARY | 132     | NULL | 14417 | Using index; Using filesort | 
+----+-------------+-------+-------+---------------+---------+---------+------+-------+-----------------------------+

mysql> explain select * from sometable order by keycol ;
+----+-------------+-------+-------+---------------+---------+---------+------+-------+-------------+
| id | select_type | table | type  | possible_keys | key     | key_len | ref  | rows  | Extra       |
+----+-------------+-------+-------+---------------+---------+---------+------+-------+-------------+
|  1 | SIMPLE      | pro   | index | NULL          | PRIMARY | 132     | NULL | 14417 | Using index | 
+----+-------------+-------+-------+---------------+---------+---------+------+-------+-------------+

Note the extra 'filesort' stage required to execute the query. That means mysql is queuing up the result in a temporary buffer and sorting it itself using a quicksort in an extra stage, throwing out whatever the index order was. Using the original collation this step is uneccessary as mysql knows the order from index initially.

answered Sep 20 '22 23:09

ʞɔıu

Related questions
                            
                                osx mariaDB how to set max_allowed_packet
                            
                                Spring boot JPA: remove column on entity change
                            
                                .NET Core 2 with MySql.Data results in permission error
                            
                                'Table doesn't exist' on django makemigrations
                            
                                MySQL: Able to connect to localhost but not to 127.0.0.1
                            
                                Force MariaDB clients to use utf8mb4
                            
                                Django Model's DateTimeField is taking UTC even when timezone is Asia/Calcutta everywhere
                            
                                Hibernate 5 ignores @Table schema param
                            
                                How to remove all records of the table in Hybris?
                            
                                Yii2 Migration to increase column length
                            
                                SQL RIGHT function equivalent in Entity framework Core
                            
                                Liquibase Add Unique Constraint During Table Creation
                            
                                Problems installing mysql2 gem
                            
                                How to group by time intervals with Google BigQuery
                            
                                How to create an in-memory database for PHPUnit testing?
                            
                                MySQL InnoDB: Differences between WAL, Double Write Buffer, Log Buffer, Redo Log
                            
                                JPA & Hibernate - Composite primary key with foreign key
                            
                                MySQL - Create View That Gets the Min/Max values of a collection of results
                            
                                What is the correct/ fastest way to update/insert a record in sql (Firebird/MySql)
                            
                                Select Data from two tables with identical columns

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With