Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PostgreSQL improperly sorts unicode chars with Czech collation

I have a table with collation set to cs_CZ (Czech):

    Name   | Encoding |  Collation  |    CType
-----------+----------+-------------+-------------
 foo       | UTF8     | cs_CZ.UTF-8 | cs_CZ.UTF-8

but when I order by string, result is not sorted as it should be according to Czech alphabet:

=> SELECT surname FROM foo ORDER BY surname;
     surname     
-----------------
A
Da
Ďb
Dc
E

So it is sorted like if the unicode accent character (Ď) was converted to its ASCII version without accent (D). But the Czech alphabet is: ... C -> D -> Ď -> E ..., so the returned order is incorrect (in this example it should be: A -> Da -> Dc -> Ďb -> E).

Is this usual PostgreSQL behavior? Is there a way how get it sorted correctly according to Czech alphabet?

EDIT: Tried on Postgres 9.1.4, both have the same behavior. It is an Arch Linux machine.
EDIT2: Adjusted example, the Ď is the real problem.

like image 431
Ondrej Machulda Avatar asked May 30 '13 15:05

Ondrej Machulda


1 Answers

It is correct. Accent for á, ď, é, ě, í, ň, ó, ť, ú, ů, ý should be ignored see article

Czech sort rules are little bit complex :)

like image 87
Pavel Stehule Avatar answered Sep 26 '22 15:09

Pavel Stehule