What are the advantages/disadvantages between using utf8 as a charset against using latin1?
If utf can support more chars and is used consistently wouldn't it always be the better choice? Is there any reason to choose latin1?
UTF8 Advantages:
Supports most languages, including RTL languages such as Hebrew.
No translation needed when importing/exporting data to UTF8 aware components (JavaScript, Java, etc).
UTF8 Disadvantages:
Non-ASCII characters will take more time to encode and decode, due to their more complex encoding scheme.
Non-ASCII characters will take more space as they may be stored using more than 1 byte (characters not in the first 127 characters of the ASCII characters set). A CHAR(10)
or VARCHAR(10)
field may need up to 30 bytes to store some UTF8 characters.
Collations other than utf8_bin
will be slower as the sort order will not directly map to the character encoding order), and will require translation in some stored procedures (as variables default to utf8_general_ci
collation).
If you need to JOIN
UTF8 and non-UTF8 fields, MySQL will impose a SEVERE performance hit. What would be sub-second queries could potentially take minutes if the fields joined are different character sets/collations.
Bottom line:
If you don't need to support non-Latin1 languages, want to achieve maximum performance, or already have tables using latin1
, choose latin1
.
Otherwise, choose UTF8
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With