I'm using Postgresql 9.4.5. When I go to psql and run \l
I get
Encoding is UTF8
Collate is en_US.UTF-8
cCtype is en_US.UTF-8
I have products
table with a name
column that has the following names:
T-700A Grouped
T-700 AGrouped
T-700A Halved
T-700 Whole
When I execute the following SQL in pql
SELECT name FROM products WHERE name LIKE '%T-700%' ORDER By name ASC;
I get the following output
T-700A Grouped
T-700 AGrouped
T-700A Halved
T-700 Whole
That sorting doesn't look natural. I expected to get
T-700 AGrouped
T-700 Whole
T-700A Grouped
T-700A Halved
It doesn't seem like Postgres is handling spaces the way I expected. Can anyone explain what is happening and suggest a way to fix this?
You cannot to change these values for already created databases. In this moment, when there are not other databases, the most easy solution is a) stop database, b) delete data directory, c) run manually initdb with options --encoding and --locale (run this command under postgres user).
In PostgreSQL, the TRIM() function is used to remove the longest string consisting of spaces or any specified character from a string. By default, the TRIM() function removes all spaces (' ') if not specified explicitly.
The collation feature allows specifying the sort order and character classification behavior of data per-column, or even per-operation. This alleviates the restriction that the LC_COLLATE and LC_CTYPE settings of a database cannot be changed after its creation.
On Unix/Linux SE, a friendly expert explained that what you see is the proper way to sort Unicode. Basically, the standard is trying to sort:
di Silva Fred di Silva Fred
di Silva John diSilva Fred
diSilva Fred disílva Fred
diSilva John -> di Silva John
disílva Fred diSilva John
disílva John disílva John
Now if spaces were as important as letters, the sort could not separate the various identical spellings of Fred and John. So what happens is that it first sorts without spaces. Then in a second pass, strings that are the same without whitespace are sorted. (This is a simplification, the real algorithm looks fairly complex, assigning whitespace, accents and non-printable characters various levels of precedence.)
You can bypass the Unicode collation by setting:
export LC_ALL=C
Or in Postgres by casting to byte array for sorting:
order by name::bytea
Or (from Kiln's answer) by specifying the C
collation:
order by name collate "C"
Or by altering the default collation for the column:
alter table products alter column name type text collate "C";
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With