Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When to use utf-8 and when to use latin1 in MySQL?

I know that MySQL has default of latin1 encoding and apparently it takes 1 byte to store a character in latin1 and 3 bytes to store a character in utf-8 - is that correct?

I am working on a site that I hope will be used globally. Do I absolutely need to have utf-8? Or will I be able to get away with using latin1?

Also, I tried to change some tables from latin1 to utf8 but I got this error: Speficief key was too long; max key length is 1000 bytes Does anyone know the solution to this? And should I really solve that or may latin1 be enough?

Thanks, Alex

like image 527
Genadinik Avatar asked Feb 01 '11 00:02

Genadinik


People also ask

What is the difference between UTF-8 and Latin-1?

what is the difference between utf8 and latin1? They are different encodings (with some characters mapped to common byte sequences, e.g. the ASCII characters and many accented letters). UTF-8 is one encoding of Unicode with all its codepoints; Latin1 encodes less than 256 characters.

Should I use UTF-8 or utf8mb4?

The difference between utf8 and utf8mb4 is that the former can only store 3 byte characters, while the latter can store 4 byte characters. In Unicode terms, utf8 can only store characters in the Basic Multilingual Plane, while utf8mb4 can store any Unicode character.

When should I use UTF-8 encoding?

Why use UTF-8? An HTML page can only be in one encoding. You cannot encode different parts of a document in different encodings. A Unicode-based encoding such as UTF-8 can support many languages and can accommodate pages and forms in any mixture of those languages.

What is the difference between utf8mb4 and UTF-8 charsets in MySQL?

utf-8 can store only 1, 2 or 3 bytes characters, while utf8mb4 can store 4 bytes characters as well. utf-8 is a subset of characters given by utf8mb4 .


2 Answers

it takes 1 byte to store a character in latin1 and 3 bytes to store a character in utf-8 - is that correct?

It takes 1 bytes to store a latin1 character and 1 to 3 bytes to store a UTF8 character.

If you only use basic latin characters and punctuation in your strings (0 to 128 in Unicode), both charsets will occupy the same length.

Also, I tried to change some tables from latin1 to utf8 but I got this error: "Speficief key was too long; max key length is 1000 bytes" Does anyone know the solution to this? And should I really solve that or may latin1 be enough?

If you have a column of VARCHAR(334) or longer, MyISAM wont't let you create an index on it since there is remote possibility of the column to occupy more that 1000 bytes.

Note that keys of such length are rarely useful. You can create a prefixed index which will be almost as selective for any real-world data.

like image 158
Quassnoi Avatar answered Oct 09 '22 14:10

Quassnoi


At a bare minimum I would suggest using UTF-8. Your data will be compatible with every other database out there nowadays since 90%+ of them are UTF-8.

If you go with LATIN1/ISO-8859-1 you risk the data being not properly stored because it doesn't support international characters... so you might run into something like the left side of this image:

enter image description here

If you go with UTF-8, you don't need to deal with these headaches.

Regarding your error, it sounds like you need to optimize your database. Consider this: http://bugs.mysql.com/bug.php?id=4541#c284415

It would help if you gave specifics on your table schema and column for that issue.

like image 9
meder omuraliev Avatar answered Oct 09 '22 14:10

meder omuraliev