Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to make MySQL handle UTF-8 properly

Tags:

mysql

utf-8

One of the responses to a question I asked yesterday suggested that I should make sure my database can handle UTF-8 characters correctly. How I can do this with MySQL?

like image 310
Ben Avatar asked Oct 14 '08 18:10

Ben


People also ask

Does MySQL support UTF-8?

MySQL supports multiple Unicode character sets: utf8mb4 : A UTF-8 encoding of the Unicode character set using one to four bytes per character. utf8mb3 : A UTF-8 encoding of the Unicode character set using one to three bytes per character.

What encoding does MySQL use?

MySQL 4.1 and above has a default character set that it calls utf8 but which is actually only a subset of UTF-8 (allows only three-byte characters and smaller). Use utf8mb4 as your charset if you want "full" UTF-8.

Which UTF-8 collation should I use?

If you elect to use UTF-8 as your collation, always use utf8mb4 (specifically utf8mb4_unicode_ci). You should not use UTF-8 because MySQL's UTF-8 is different from proper UTF-8 encoding. This is the case because it doesn't offer full unicode support which can lead to data loss or security issues.


2 Answers

Update:

Short answer - You should almost always be using the utf8mb4 charset and utf8mb4_unicode_ci collation.

To alter database:

ALTER DATABASE dbname CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; 

See:

  • Aaron's comment on this answer How to make MySQL handle UTF-8 properly

  • What's the difference between utf8_general_ci and utf8_unicode_ci

  • Conversion guide: https://dev.mysql.com/doc/refman/5.5/en/charset-unicode-conversion.html

Original Answer:

MySQL 4.1 and above has a default character set of UTF-8. You can verify this in your my.cnf file, remember to set both client and server (default-character-set and character-set-server).

If you have existing data that you wish to convert to UTF-8, dump your database, and import it back as UTF-8 making sure:

  • use SET NAMES utf8 before you query/insert into the database
  • use DEFAULT CHARSET=utf8 when creating new tables
  • at this point your MySQL client and server should be in UTF-8 (see my.cnf). remember any languages you use (such as PHP) must be UTF-8 as well. Some versions of PHP will use their own MySQL client library, which may not be UTF-8 aware.

If you do want to migrate existing data remember to backup first! Lots of weird choping of data can happen when things don't go as planned!

Some resources:

  • complete UTF-8 migration (cdbaby.com)
  • article on UTF-8 readiness of php functions (note some of this information is outdated)
like image 180
Owen Avatar answered Oct 08 '22 03:10

Owen


To make this 'permanent', in my.cnf:

[client] default-character-set=utf8 [mysqld] character-set-server = utf8 

To check, go to the client and show some variables:

SHOW VARIABLES LIKE 'character_set%'; 

Verify that they're all utf8, except ..._filesystem, which should be binary and ..._dir, that points somewhere in the MySQL installation.

like image 45
Javier Avatar answered Oct 08 '22 02:10

Javier