Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Ordering differences between Postgres instances on different machines (same locale)

I have two Postgres 9.1 instances: one local, installed via Postgres.app on OS X, and one remote, on Heroku. I've ensured that lc_collate is en_US.UTF-8 on both machines but am still seeing different behavior between the two.

On my local instance, SELECT 'i' > 'N' returns t whereas remotely it returns f. Given that I've already checked lc_* on both systems, what explains the difference I'm seeing?

like image 365
jredburn Avatar asked Nov 13 '12 22:11

jredburn


1 Answers

From the point of view of Unicode, the case ordering is a customization. Excerpt from http://www.unicode.org/reports/tr10:

Case Ordering. Some dictionaries and authors collate uppercase before lowercase while others use the reverse, so that preference needs to be customizable. Sometimes the case ordering is mandated by the government, as in Denmark. Often it is simply a customization or user preference.

Mac OS X simply has a different case ordering than the OS used by Heroku. On Mac OS X:

$ LC_CTYPE=en_US.UTF-8 sort << EOF
> i
> N
> EOF

produces:

N
i

The exact same command and same data on Ubuntu 12.04 produces:

i
N

This has none to do with PostgreSQL, except for the fact that it uses the OS for collation, so these unfortunate discrepancies between different OS impact databases.

PostgreSQL 10 and ICU

Starting with version 10, PostgreSQL may use collations provided by the ICU library, for servers compiled with ICU. These collations can sort consistently across operating systems.

like image 118
Daniel Vérité Avatar answered Dec 05 '22 09:12

Daniel Vérité