Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it safe to update tables from utf8 to utf8mb4 in MySQL?

I am aware that similar questions have been asked before, but we need a more definitive answer.

Is it safe to update MySQL tables encoded in utf8 to utf8mb4 in all cases. More specifically, even for varchar fields with strings generated using for example (in Java):

new BigInteger(130, random).toString(32)

From our understanding utf8mb4 is a superset of utf8 so our assumption would be that everything should be fine, but we would love some input from more MySQL superusers.

like image 777
Nico Huysamen Avatar asked Jan 01 '23 06:01

Nico Huysamen


1 Answers

How the data was originally inserted in MySQL is irrelevant. Let's suppose you used the entire character set of utf8, e.g. the BMP characters.

utf8mb4 is a superset of utf8mb3 (alias utf8) as documented here

10.9.7 Converting Between 3-Byte and 4-Byte Unicode Character Sets

One advantage of converting from utf8mb3 to utf8mb4 is that this enables applications to use supplementary characters. One tradeoff is that this may increase data storage space requirements.

In terms of table content, conversion from utf8mb3 to utf8mb4 presents no problems:

  • For a BMP character, utf8mb4 and utf8mb3 have identical storage characteristics: same code values, same encoding, same length.

  • For a supplementary character, utf8mb4 requires four bytes to store it, whereas utf8mb3 cannot store the character at all. When converting utf8mb3 columns to utf8mb4, you need not worry about converting supplementary characters because there will be none.

In terms of table structure, these are the primary potential incompatibilities:

  • For the variable-length character data types (VARCHAR and the TEXT types), the maximum permitted length in characters is less for utf8mb4 columns than for utf8mb3 columns.

  • For all character data types (CHAR, VARCHAR, and the TEXT types), the maximum number of characters that can be indexed is less for utf8mb4 columns than for utf8mb3 columns.

Consequently, to convert tables from utf8mb3 to utf8mb4, it may be necessary to change some column or index definitions.

Personally I had some issues with indexes on relative long texts where the maximum size of the index was reached. It was a search index, not a unique index, so the workaround was to use less characters in the index. See also this answer

Of course I suppose that you will use the same collation. If you change the collation other issues apply.

like image 128
andreaplanet Avatar answered Jan 05 '23 03:01

andreaplanet