Just take a look, please:
WITH toks AS (
  SELECT tok
    FROM
      unnest('{ь, а, чь, ча, чль, чла}'::text[]) AS tok
  ORDER BY tok COLLATE "uk_UA"
)
SELECT ROW_NUMBER() OVER() AS "#", tok FROM toks
ORDER BY tok COLLATE "uk_UA"
PostgreSQL 9.3 (ubuntu) gives me this result:
 # | tok 
---+-----
 1 | а
 2 | ча
 3 | чль
 4 | чла
 5 | чь
 6 | ь
(6 rows)
Here rows 1, 2, 5 and 6 are sorted properly ("ь" goes after "а") while rows 3 and 4 are sorted wrongly ("а" goes after "ь").
All letters are Cyrillic, I've checked so many times.
Please, what's wrong and how to workaround .(
UPDATE: this is a bug which was fixed in mainstream recently: https://sourceware.org/bugzilla/show_bug.cgi?id=17293
UPDATE2: Please note my own answer below.
PostgreSQL relies on the operating system's locale to sort.
See how Ubuntu 14.04 sorts that list:
# locale-gen uk_UA.UTF-8 Generating locales... uk_UA.UTF-8... done Generation complete. # cat >file ь а чь ча чль чла # LC_ALL=uk_UA.UTF-8 sort file а ча чль чла чь ь
In the comments you say it's different but what I get here is exactly the same order as your query.
Indeed чль comes before чла which intuitively is weird but I don't know cyrillic.
You may look at /usr/share/i18n/locales/uk_UA for the definition of the locale, and bring it up as an ubuntu bug of the locales packages.
So, the solutions has been completed in these steps:
glibc version (2.19 now)Makefile)/usr/share/i18n/locales/uk_UA up[cd /usr/share/i18n/;] patch --dry-run -p2 < locales_uk_UA_softsign.diff --- then with no --dry-run.locale-genservice postgresql restartIf you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With