Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

UTF-8 vs Latin1 mysql, indexes not used on utf-8

Tags:

mysql

encoding

I have tried creating my mysql tables with both UTF-8 and Latin1 Char sets. When I use Latin1 my indexes are used, when I use UTF-8 indexes are not used when selecting/limiting records. Is there something I am missing with the Char sets that causes this to happen?

Cheers

Ke

like image 517
Ke. Avatar asked Feb 15 '10 13:02

Ke.


People also ask

Does UTF 8 include Latin1?

The Chinese alphabet and others do not use Latin1 at all. These characters and symbols are part of a much larger encoding system called UTF8, which also includes Latin1.

What is the difference between UTF 8 and Latin1?

what is the difference between utf8 and latin1? They are different encodings (with some characters mapped to common byte sequences, e.g. the ASCII characters and many accented letters). UTF-8 is one encoding of Unicode with all its codepoints; Latin1 encodes less than 256 characters.

How do I change MySQL from UTF8 to Latin1?

Similarly, here's the command to change character set of MySQL table from latin1 to UTF8. Replace table_name with your database table name. mysql> ALTER TABLE table_name CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci; Hopefully, the above tutorial will help you change database character set to utf8mb4 (UTF-8).

What is charset Latin1 in MySQL?

The MySQL latin1 character set is such a one-byte character set for Western Europe, and it is the default character set of MySQL up to and including 5.7. In spite of the name, the character set is actually Windows-1252 compliant, which is a superset of ISO-8859-1, also known as Latin-1.


3 Answers

Indexes can be used only when the expression's collation matches that on the indexed column.

If the expression's COERCIBILITY is lower than that of the column (that is 2), the column's collation is casted to that of the expression, and the index is not used.

Normally, literals have COERCIBILITY of 4 and user variables that of 3, so this should be not a problem.

However, if you mix different collations in a JOIN or UNION, the cast order is not guaranteed.

In this case you should provide explicit collation to the column your are casting (most probably, you want to cast latin1 to UTF8), and this should be the collation of the column you are casting to:

SELECT  *
FROM    utf_table
JOIN    latin_table
ON      utf_column = latin_column COLLATE UTF8_GENERAL_CI
like image 139
Quassnoi Avatar answered Sep 24 '22 23:09

Quassnoi


I understand this now, the tables i was joining were not of the same char set

DEFAULT CHARACTER SET utf8 COLLATE utf8_unicode_ci;

Once I changed these the indexes worked fine.

like image 35
Ke. Avatar answered Sep 24 '22 23:09

Ke.


The indexes themselves are stored with the same encoding as the columns they are indexing. Comparing a UTF-8 character to a latin1 character cannot make use of the index, because it would need to convert both to the same encoding, since the index optimizations are performed on a byte level (and ß in latin1 has a different byte sequence than in UTF-8).

like image 42
soulmerge Avatar answered Sep 22 '22 23:09

soulmerge