Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the best collation to use for MySQL with PHP? [closed]

I'm wondering if there is a "best" choice for collation in MySQL for a general website where you aren't 100% sure of what will be entered? I understand that all the encodings should be the same, such as MySQL, Apache, the HTML and anything inside PHP.

In the past I have set PHP to output in "UTF-8", but which collation does this match in MySQL? I'm thinking it's one of the UTF-8 ones, but I have used utf8_unicode_ci, utf8_general_ci, and utf8_bin before.

like image 990
Darryl Hein Avatar asked Dec 15 '08 07:12

Darryl Hein


People also ask

What MySQL collation should I use?

For any version of MySQL or MariaDB, use utf8mb4 with its default COLLATION .

Does MySQL 5.7 support utf8mb4?

MySQL supports these Unicode character sets: utf8mb4 : A UTF-8 encoding of the Unicode character set using one to four bytes per character.

What is collation utf8_general_ci?

utf8_general_ci is a legacy collation that does not support expansions, contractions, or ignorable characters. It can make only one-to-one comparisons between characters.

What is collation PhpMyAdmin?

By manoj on April 23rd, 2018. Changing the Database Collation in PhpMyAdmin. A collation is a set of rules that defines how to compare and sort character strings. Every character set has at least one collation. The default character set for MySQL is latin1, with a default database collation of latin1_swedish_ci.


2 Answers

Actually, you probably want to use utf8_unicode_ci or utf8_general_ci.

  • utf8_general_ci sorts by stripping away all accents and sorting as if it were ASCII
  • utf8_unicode_ci uses the Unicode sort order, so it sorts correctly in more languages

However, if you are only using this to store English text, these shouldn't differ.

like image 27
Vegard Larsen Avatar answered Oct 21 '22 09:10

Vegard Larsen


The main difference is sorting accuracy (when comparing characters in the language) and performance. The only special one is utf8_bin which is for comparing characters in binary format.

utf8_general_ci is somewhat faster than utf8_unicode_ci, but less accurate (for sorting). The specific language utf8 encoding (such as utf8_swedish_ci) contain additional language rules that make them the most accurate to sort for those languages. Most of the time I use utf8_unicode_ci (I prefer accuracy to small performance improvements), unless I have a good reason to prefer a specific language.

You can read more on specific unicode character sets on the MySQL manual - http://dev.mysql.com/doc/refman/5.0/en/charset-unicode-sets.html

like image 112
Eran Galperin Avatar answered Oct 21 '22 10:10

Eran Galperin