I am aware that similar questions have been asked before, but we need a more definitive answer.
Is it safe to update MySQL tables encoded in utf8
to utf8mb4
in all cases. More specifically, even for varchar
fields with strings generated using for example (in Java):
new BigInteger(130, random).toString(32)
From our understanding utf8mb4
is a superset of utf8
so our assumption would be that everything should be fine, but we would love some input from more MySQL superusers.
How the data was originally inserted in MySQL is irrelevant. Let's suppose you used the entire character set of utf8, e.g. the BMP characters.
utf8mb4 is a superset of utf8mb3 (alias utf8) as documented here
10.9.7 Converting Between 3-Byte and 4-Byte Unicode Character Sets
One advantage of converting from utf8mb3 to utf8mb4 is that this enables applications to use supplementary characters. One tradeoff is that this may increase data storage space requirements.
In terms of table content, conversion from utf8mb3 to utf8mb4 presents no problems:
For a BMP character, utf8mb4 and utf8mb3 have identical storage characteristics: same code values, same encoding, same length.
For a supplementary character, utf8mb4 requires four bytes to store it, whereas utf8mb3 cannot store the character at all. When converting utf8mb3 columns to utf8mb4, you need not worry about converting supplementary characters because there will be none.
In terms of table structure, these are the primary potential incompatibilities:
For the variable-length character data types (VARCHAR and the TEXT types), the maximum permitted length in characters is less for utf8mb4 columns than for utf8mb3 columns.
For all character data types (CHAR, VARCHAR, and the TEXT types), the maximum number of characters that can be indexed is less for utf8mb4 columns than for utf8mb3 columns.
Consequently, to convert tables from utf8mb3 to utf8mb4, it may be necessary to change some column or index definitions.
Personally I had some issues with indexes on relative long texts where the maximum size of the index was reached. It was a search index, not a unique index, so the workaround was to use less characters in the index. See also this answer
Of course I suppose that you will use the same collation. If you change the collation other issues apply.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With