Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Collation STRENGTH and local language relation

I have read the following from Collator's Javadoc.

"The exact assignment of strengths to language features is locale dependant. For example, in Czech, "e" and "f" are considered primary differences, while "e" and "ê" are secondary differences, "e" and "E" are tertiary differences and "e" and "e" are identical."

Does this mean that I should set the STRENGTH based on the language I am using? If so can someone suggest the defaults for the locales: us_en, us_es, ca_fr, spain_spanish, chile_spanish, portuguese

like image 391
Aravind Yarram Avatar asked Sep 18 '10 01:09

Aravind Yarram


1 Answers

It really depends on what you're trying to do. The following is true for most (all?) languages that use the Latin alphabet:

  • Primary
    • Different: a, á, Á, b
    • Same: á, â
    • Same: a, A
  • Secondary
    • Different: a, á, Á, b
    • Different: á, â
    • Same: a, A
  • Tertiary
    • Different: a, á, Á, b
    • Different: á, â
    • Different: a, A
  • Identical
    • Also consider differences you can't see, for example between (accented A) and (A) + (accent)

There will be slight variations between languages, but in essence:

  • If you want case-sensitive comparison, use Tertiary.
  • For case-insensitive comparison, use either Primary or Secondary depending on whether you want á to be grouped with â.
  • Some of the collation rules are quite strange. a is different from á even in Primary, and á is different from Á even in Primary/Secondary. I don't know why; bug, maybe?
  • Who knows what happens in non-Latin languages.
like image 97
Johannes Sasongko Avatar answered Oct 19 '22 15:10

Johannes Sasongko