Utf8_general_ci or utf8mb4 or...?

Tags:

utf16 or utf32? I'm trying to store content in a lot of languages. Some of the languages use double-wide fonts (for example, Japanese fonts are frequently twice as wide as English fonts). I'm not sure which kind of database I should be using. Any information about the differences between these four charsets...

552

asked Jul 18 '12 02:07

Wolfpack'08

1 Answers

MySQL's utf32 and utf8mb4 (as well as standard UTF-8) can directly store any character specified by Unicode; the former is fixed size at 4 bytes per character whereas the latter is between 1 and 4 bytes per character.

utf8mb3 and the original utf8 can only store the first 65,536 codepoints, which will cover CJVK (Chinese, Japanese, Vietnam, Korean), and use 1 to 3 bytes per character.

utf16 uses 2 bytes for the first 65,536 codepoints, and 4 bytes for everything else.

As for fonts, that's strictly a visual thing.

"The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)"

See also MySQL documentation for Unicode support.

139

answered Sep 23 '22 02:09

Ignacio Vazquez-Abrams

Related questions
                            
                                request.getQueryString() seems to need some encoding
                            
                                How to remove accents in MySQL?
                            
                                PHP: Convert unicode codepoint to UTF-8
                            
                                Set locale to system default UTF-8
                            
                                How to print utf-8 to console with Python 3.4 (Windows 8)?
                            
                                When to use utf8 as a header in py files
                            
                                Range of UTF-8 Characters in C++11 Regex
                            
                                How to identify/delete non-UTF-8 characters in R
                            
                                What is the encoding of Chinese characters on Wikipedia?
                            
                                PDO::exec() or PDO::query()?
                            
                                How to get ncurses to output astral plane unicode characters
                            
                                UTF-8 Encoding name in downloaded file
                            
                                UTF-8 characters mangled in HTTP Basic Auth username
                            
                                PHP Curl UTF-8 Charset
                            
                                What is the difference between EM Dash #151; and #8212;?
                            
                                How to convert utf8 string to utf8 byte array?
                            
                                Is "SET CHARACTER SET utf8" necessary?
                            
                                Loading utf-8 encoded text into MySQL table
                            
                                glob() can't find file names with multibyte characters on Windows?
                            
                                Unicode Characters in ggplot2 PDF Output

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Utf8_general_ci or utf8mb4 or...?

Tags:

utf-8

localization

utf-16

utf-32

utf8mb4

Wolfpack'08

People also ask

1 Answers

Ignacio Vazquez-Abrams

Recent Activity

Donate For Us