Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is this a safe way to convert MySQL tables from latin1 to utf-8?

I need to change all the tables in one of my databases from latin1 to utf-8 (with utf8_bin collation).

I have dumped the database, created a test database from it, and run the following without any errors or warnings for each table:

ALTER TABLE tablename CONVERT TO CHARSET utf8 COLLATION utf8_bin

Is it safe for me to repeat this on the real database? The data seems fine by inspection...

like image 377
nfm Avatar asked May 31 '11 05:05

nfm


People also ask

How convert MySQL database from Latin1 to UTF-8?

Similarly, here's the command to change character set of MySQL table from latin1 to UTF8. Replace table_name with your database table name. mysql> ALTER TABLE table_name CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci; Hopefully, the above tutorial will help you change database character set to utf8mb4 (UTF-8).

How do I convert MySQL database to UTF-8 encoding?

To change the character set encoding to UTF-8 for the database itself, type the following command at the mysql> prompt. Replace dbname with the database name: Copy ALTER DATABASE dbname CHARACTER SET utf8 COLLATE utf8_general_ci; To exit the mysql program, type \q at the mysql> prompt.

What is the difference between UTF-8 and Latin1?

what is the difference between utf8 and latin1? They are different encodings (with some characters mapped to common byte sequences, e.g. the ASCII characters and many accented letters). UTF-8 is one encoding of Unicode with all its codepoints; Latin1 encodes less than 256 characters.

Does UTF-8 include Latin1?

These characters and symbols are part of a much larger encoding system called UTF8, which also includes Latin1. Since WRDS' inception, all of our data has been stored in Latin1 encoding. As WRDS becomes much more global in scope and much more text-heavy, the need to move to UTF-8 encoding is apparent.


3 Answers

I've done this a few times on production databases in the past (converting from the old standard encoding swedish to latin1), and when MySQL encounters a character that cannot be translated to the target encoding, it aborts the conversion and remains in the unchanged state. Therefor, I'd deem the ALTER TABLE statement working.

like image 25
0xCAFEBABE Avatar answered Sep 30 '22 12:09

0xCAFEBABE


There are 3 different cases to consider:

The values are indeed encoded using Latin1

This is the consistent case: declared charset and content encoding match. This was the only case I covered in my initial answer.

Use the command you suggested:

ALTER TABLE tablename CONVERT TO CHARSET utf8 COLLATE utf8_bin

Note that the CONVERT TO CHARACTER SET command only appeared in MySQL 4.1.2, so anyone using a database installed before 2005 had to use an export/import trick. This is why there are so many legacy scripts and document on Internet doing it the old way.

The values are already encoded using utf8

In this case, you don't want mysql to convert any data, you only need to change the column's metadata.

For this, you have to change the type to BLOB first, then to TEXT utf8 for each column, so that there are no value conversions:

ALTER TABLE t1 CHANGE c1 c1 BLOB;
ALTER TABLE t1 CHANGE c1 c1 TEXT CHARACTER SET utf8

This is the recommended way, and it is explicitely documented in Alter Table Syntax Documentation.

The values use in a different encoding

The default encoding was Latin1 for several years on a some Linux distributions. In this case, you have to use a combination of the two techniques:

  • Fix the table meta-data, using the BLOB type trick
  • Convert the values using CONVERT TO.
like image 59
Jerome Avatar answered Sep 30 '22 12:09

Jerome


A straightforward conversion will potentially break any strings with non-utf7 characters.

If you don't have any of those (i.e. all of your text is english), you'll usually be fine.

If you've any of those, however, you need to convert all char/varchar/text fields to blob in an initial run, and to convert them to utf8 in a subsequent run.

See this article for detailed procedures:

http://codex.wordpress.org/Converting_Database_Character_Sets

like image 20
Denis de Bernardy Avatar answered Sep 30 '22 10:09

Denis de Bernardy