Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What does collation mean?

What does collation mean in SQL, and what does it do?

like image 256
LOVE_KING Avatar asked Oct 12 '22 12:10

LOVE_KING


3 Answers

Collation can be simply thought of as sort order.

In English (and it's strange cousin, American), collation may be a pretty simple matter consisting of ordering by the ASCII code.

Once you get into those strange European languages with all their accents and other features, collation changes. For example, though the different accented forms of a may exist at disparate code points, they may all need to be sorted as if they were the same letter.

like image 62
paxdiablo Avatar answered Oct 18 '22 03:10

paxdiablo


Besides the "accented letters are sorted differently than unaccented ones" in some Western European languages, you must take into account the groups of letters, which sometimes are sorted differently, also.

Traditionally, in Spanish, "ch" was considered a letter in its own right, same with "ll" (both of which represent a single phoneme), so a list would get sorted like this:

  • caballo
  • cinco
  • coche
  • charco
  • chocolate
  • chueco
  • dado
  • (...)
  • lámpara
  • luego
  • llanta
  • lluvia
  • madera

Notice all the words starting with single c go together, except words starting with ch which go after them, same with ll-starting words which go after all the words starting with a single l. This is the ordering you'll see in old dictionaries and encyclopedias, sometimes even today by very conservative organizations.

The Royal Academy of the Language changed this to make it easier for Spanish to be accomodated in the computing world. Nevertheless, ñ is still considered a different letter than n and goes after it, and before o. So this is a correctly ordered list:

  • Namibia
  • número
  • ñandú
  • ñú
  • obra
  • ojo

By selecting the correct collation, you get all this done for you, automatically :-)

like image 43
Joe Pineda Avatar answered Oct 18 '22 01:10

Joe Pineda


Rules that tell how to compare and sort strings: letters order; whether case matters, whether diacritics matter etc.

For instance, if you want all letters to be different (say, if you store filenames in UNIX), you use UTF8_BIN collation:

SELECT  'A' COLLATE UTF8_BIN = 'a' COLLATE UTF8_BIN

---
0

If you want to ignore case and diacritics differences (say, for a search engine), you use UTF8_GENERAL_CI collation:

SELECT  'A' COLLATE UTF8_GENERAL_CI = 'ä' COLLATE UTF8_GENERAL_CI

---
1

As you can see, this collation (comparison rule) considers capital A and lowecase ä the same letter, ignoring case and diacritic differences.

like image 14
Quassnoi Avatar answered Oct 18 '22 01:10

Quassnoi