Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PostgreSQL incorrect sorting

Tags:

sql

postgresql

I use PostgreSQL 9.3.3 and I have a table with one column named as title (character varying(50)).

When I have executed the following query:

select * from test
order by title asc

I got the following results:

#
A
#Example

Why "#Example" is in the last position? In my opinion "#Example" should be in the second position.

like image 704
user3364391 Avatar asked Mar 20 '14 13:03

user3364391


People also ask

What is default order in PostgreSQL?

The ORDER BY clause in PostgreSQL is used together with the SELECT statement to sort table data. The table data can either be sorted in ascending or descending order. By default, the data is sorted in ascending order.

What is collate Postgres?

The collation feature allows specifying the sort order and character classification behavior of data per-column, or even per-operation. This alleviates the restriction that the LC_COLLATE and LC_CTYPE settings of a database cannot be changed after its creation.

Does Postgres preserve insertion order?

The answer for this simple case is: "Yes".


2 Answers

Sort behaviour for text (including char and varchar as well as the text type) depends on the current collation of your locale.

See previous closely related questions:

  • PostgreSQL Sort
  • https://stackoverflow.com/q/21006868/398670

If you want to do a simplistic sort by ASCII value, rather than a properly localized sort following your local language rules, you can use the COLLATE clause

select * 
from test
order by title COLLATE "C" ASC

or change the database collation globally (requires dump and reload, or full reindex). On my Fedora 19 Linux system, I get the following results:

regress=> SHOW lc_collate;
 lc_collate  
-------------
 en_US.UTF-8
(1 row)

regress=> WITH v(title) AS (VALUES ('#a'), ('a'), ('#'), ('a#a'), ('a#')) 
          SELECT title FROM v ORDER BY title ASC;
 title 
-------
 #
 a
 #a
 a#
 a#a
(5 rows)

regress=> WITH v(title) AS (VALUES ('#a'), ('a'), ('#'), ('a#a'), ('a#')) 
          SELECT title FROM v ORDER BY title COLLATE "C" ASC;
 title 
-------
 #
 #a
 a
 a#
 a#a
(5 rows)

PostgreSQL uses your operating system's collation support, so it's possible for results to vary slightly from host OS to host OS. In particular, at least some versions of Mac OS X have significantly broken unicode collation handling.

like image 145
Craig Ringer Avatar answered Nov 16 '22 01:11

Craig Ringer


It seems, that when sorting Oracle as well as Postgres just ignore non alpha numeric chars, e.g.

  select '*' 
   union all
  select '#' 
   union all
  select 'A'
   union all
  select '*E'
   union all
  select '*B'
   union all
  select '#C'
   union all
  select '#D'
order by 1 asc

returns (look: that DBMS doesn't pay any attention on prefix before 'A'..'E')

  *
  #
  A
  *B
  #C
  #D
  *E

In your case, what Postgres actually sorts is

'', 'A' and 'Example'

If you put '#' in the middle od the string, the behaviour will be the same:

  select 'A#B'
   union all
  select 'AC'
   union all
  select 'A#D'  
   union all
  select 'AE' 
order by 1 asc

returns (# ignored, and so 'AB', 'AC', 'AD' and 'AE' actually compared)

  A#B
  AC
  A#D
  AE

To change the comparison rules you should use collation, e.g.

  select '#' collate "POSIX"
   union all
  select 'A' collate "POSIX"
   union all
  select '#Example' collate "POSIX"
order by 1 asc

returns (as it required in your case)

  #
  #Example
  A
like image 40
Dmitry Bychenko Avatar answered Nov 16 '22 00:11

Dmitry Bychenko