Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Postgresql sorting language specific characters (collation)

I, as a speaker of language, which has several non-common characters like ÕÜÖÄ, think that if i create an app, which allows user to save content in estonian to database, then this app should also be able to sort data correctly according to language settings.

Also, as a spear of very small language, like estonian is, have to consider, that other people might use the app and want sorting in their language. Several european languages also have their own mix of special characters.

Its even worse, if you include languages like russian into the mix - they have completely different characters.

Now since i know the django site can display all those special characters just fine, i'm only worried about sorting, which goes on database level. So how can i ensure, that objects are read from database and sorted into correct, language specific order.

Example case: If i create database with utf8 encoding and with default collation(english utf8), then table containing values

alan
oskar
ölen
älan
öskar

Will be sorted (by "select * from test order by nimi asc;") to:

alan
älan
ölen
oskar
öskar

Which is incorrect. If i create database with POSIX collation, then same query results in:

alan
oskar
älan
ölen
öskar

This SEEMS to be correct, cause if you throw few more names (õlan, ålan) into the mix, then the results are:

alan
oskar
älan
ålan
õlan
ölen
öskar

Cause, according to estonian alphabet, they should be:

alan
oskar
õlan
älan
ölen
öskar
ålan (cause å is not in estonian alphabet)

I am using postgresql 9.1.9 and i understand that collation support is available since 9.1 (http://www.postgresql.org/docs/9.1/interactive/collation.html).

Is that what i should use?

SELECT nimi COLLATE "et_EE" FROM test ORDER BY nimi ASC;

Fails with error ERROR: collation "et_EE.utf8" for encoding "UTF8" does not exist. Why is that? Do i have to create this collation rule or something? Is this the way to go - to specify collation rule with every query, if i want results in correct order?

Alan

Edit :1 to answer question: my locale -a shows:

C
C.UTF-8
en_AG
en_AG.utf8
en_AU.utf8
en_BW.utf8
en_CA.utf8
en_DK.utf8
en_GB.utf8
en_HK.utf8
en_IE.utf8
en_IN
en_IN.utf8
en_NG
en_NG.utf8
en_NZ.utf8
en_PH.utf8
en_SG.utf8
en_US.utf8
en_ZA.utf8
en_ZM
en_ZM.utf8
en_ZW.utf8
et_EE
et_EE.iso88591
et_EE.utf8
POSIX
zh_CN.utf8
zh_SG.utf8

sudo locale-gen et_EE.UTF-8
[sudo] password for alan: 
Generating locales...
  et_EE.UTF-8... up-to-date
Generation complete.
sudo locale-gen et_EE
Generating locales...
  et_EE.ISO-8859-1... up-to-date
Generation complete.

SQL still fails though

Edit2: I think i figured it out. I had to create the collation itself in postgresql too (http://www.postgresql.org/docs/9.1/static/sql-createcollation.html):

CREATE COLLATION "et_EE" (LOCALE = "et_EE.utf8")

So i guess i have to create those shell locale files, that you told me about, zero323, and then create collations to database and then i'm all set.

like image 760
Odif Yltsaeb Avatar asked Sep 21 '13 13:09

Odif Yltsaeb


Video Answer


1 Answers

You can check installed locale using:

locale -a

If et_EE is not listed run this commands:

sudo locale-gen et_EE
sudo locale-gen et_EE.UTF-8
sudo update-locale

and try your query again.

like image 79
zero323 Avatar answered Sep 28 '22 18:09

zero323