Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does a utf8_unicode_cs collation exist?

Does anyone know if a utf8_unicode_cs collation for MySQL exists? So far, my searches have come up dry. If it simply doesn't exist yet, is it fairly straight-forward to create one? Or somehow use utf8_unicode_ci or utf8_bin but "simulate" what one would expect from a utf8_unicode_cs collation?

like image 611
robguinness Avatar asked Mar 05 '13 07:03

robguinness


People also ask

What is collate utf8_unicode_ci?

In short: utf8_unicode_ci uses the Unicode Collation Algorithm as defined in the Unicode standards, whereas utf8_general_ci is a more simple sort order which results in "less accurate" sorting results.

What collation should I use for utf8mb4?

Each character set has a default collation. For example, the default collations for utf8mb4 and latin1 are utf8mb4_0900_ai_ci and latin1_swedish_ci , respectively.

What is the default collation for MySQL 8?

The default MySQL server character set and collation are utf8mb4 and utf8mb4_0900_ai_ci , but you can specify character sets at the server, database, table, column, and string literal levels.

What is the difference between utf8mb3 and utf8mb4?

utf8mb3 supports only characters in the Basic Multilingual Plane (BMP). utf8mb4 additionally supports supplementary characters that lie outside the BMP. utf8mb3 uses a maximum of three bytes per character. utf8mb4 uses a maximum of four bytes per character.


1 Answers

This is an old question but does not seem to be superseded by any other, so I thought it worth posting that things have changed.

MySQL version 8 now has the following collations for utf8mb4:

 utf8mb4_0900_ai_ci 
 utf8mb4_0900_as_ci
 utf8mb4_0900_as_cs
 ... and many language-specific variants of same.

(no _ai_cs as far as I know, but that would in any case be less useful: few reasons to group [a] and [a-acute] and then separately group [A] and [A-acute]).

The purpose of the original question's hypothetical "utf8_unicode_cs" is fulfilled by utf8mb4_0900_as_cs. (The 0900 means it uses Unicode v 9.0.0 as opposed to 4.0.0 used by utf8_unicode_ci.)

To use these you'd need to change the field from utf8 to utf8mb4 character set - but that's a generally good idea anyway because the old 3-byte-max encoding can't handle e.g. emoji and other non-BMP characters.

Source: https://dev.mysql.com/doc/refman/8.0/en/charset-mysql.html

like image 188
AH of LAGB Avatar answered Oct 21 '22 17:10

AH of LAGB