I'm building a small app that includes Esperanto words in my database, so I have words like ĉapelojn and brakhorloĝo, with "special" characters. Using PostgreSQL 9.4.4 I have a <code>words</code> table with the following schema: <pre class="prettyprint"><code>lingvoj_dev=# \d words Table "public.words" Column | Type | Modifiers -------------+-----------------------------+---------------------------------------------------- id | integer | not null default nextval('words_id_seq'::regclass) translated | character varying(255) | meaning | character varying(255) | times_seen | integer | inserted_at | timestamp without time zone | not null updated_at | timestamp without time zone | not null Indexes: "words_pkey" PRIMARY KEY, btree (id) </code></pre> But the following query gives some strange output: <pre class="prettyprint"><code>lingvoj_dev=# SELECT w."translated" FROM "words" AS w ORDER BY w."translated" desc limit 10; translated ------------ ĉu ŝi ĝi ĉevaloj ĉapelojn ĉapeloj ĉambro vostojn volas viro (10 rows) </code></pre> The ordering is inconsistent - I'd be okay with all of the words starting with special characters being at the end, but all of the words starting with ĉ should be grouped together and they're not! Why do ŝi and ĝi come in between ĉu and ĉevaloj? The server encoding is UTF8, and the collation is en_AU.UTF-8. edit: It looks like it's sorting all of the special characters as equivalent - it's ordering correctly based on the second character in each word. How do I make PostgreSQL see that ĉ, ŝ and ĝ are not equivalent?

<blockquote> I'd be okay with all of the words starting with special characters being at the end... </blockquote> Use collate "C": <pre class="prettyprint"><code>SELECT w."translated" FROM "words" AS w ORDER BY w."translated" collate "C" desc limit 10; </code></pre> See also Different behaviour in “order by” clause: Oracle vs. PostgreSQL The query can be problematic when using ORM. The solution may be to recreate the database with the <code>LC_COLLATE = C</code> option, as suggested by the OP in the comment. There is one more option - change the collation for a single column: <pre class="prettyprint"><code>ALTER TABLE "words" ALTER COLUMN "translated" TYPE text COLLATE "C"; </code></pre>

Postgres ordering of UTF-8 characters

Tags:

postgresql

utf-8

I'm building a small app that includes Esperanto words in my database, so I have words like ĉapelojn and brakhorloĝo, with "special" characters.

Using PostgreSQL 9.4.4 I have a words table with the following schema:

lingvoj_dev=# \d words
                                      Table "public.words"
   Column    |            Type             |                     Modifiers
-------------+-----------------------------+----------------------------------------------------
 id          | integer                     | not null default nextval('words_id_seq'::regclass)
 translated  | character varying(255)      |
 meaning     | character varying(255)      |
 times_seen  | integer                     |
 inserted_at | timestamp without time zone | not null
 updated_at  | timestamp without time zone | not null
Indexes:
    "words_pkey" PRIMARY KEY, btree (id)

But the following query gives some strange output:

lingvoj_dev=# SELECT w."translated" FROM "words" AS w ORDER BY w."translated" desc limit 10; 
translated
------------
 ĉu
 ŝi
 ĝi
 ĉevaloj
 ĉapelojn
 ĉapeloj
 ĉambro
 vostojn
 volas
 viro
(10 rows)

The ordering is inconsistent - I'd be okay with all of the words starting with special characters being at the end, but all of the words starting with ĉ should be grouped together and they're not! Why do ŝi and ĝi come in between ĉu and ĉevaloj?

The server encoding is UTF8, and the collation is en_AU.UTF-8.

edit: It looks like it's sorting all of the special characters as equivalent - it's ordering correctly based on the second character in each word. How do I make PostgreSQL see that ĉ, ŝ and ĝ are not equivalent?

666

asked Sep 18 '15 11:09

sevenseacat

1 Answers

I'd be okay with all of the words starting with special characters being at the end...

Use collate "C":

SELECT w."translated" 
FROM "words" AS w 
ORDER BY w."translated" collate "C" desc limit 10;

See also Different behaviour in “order by” clause: Oracle vs. PostgreSQL

The query can be problematic when using ORM. The solution may be to recreate the database with the LC_COLLATE = C option, as suggested by the OP in the comment. There is one more option - change the collation for a single column:

ALTER TABLE "words" ALTER COLUMN "translated" TYPE text COLLATE "C";

197

answered Oct 13 '22 08:10

klin

Related questions
                            
                                How do I get a column with consecutive, increasing numbers, without having any numbers missing?
                            
                                PostgreSQL ERROR: EXECUTE of SELECT ... INTO is not implemented
                            
                                Find the byte size of a row in PostgreSQL
                            
                                IN subquery's WHERE condition affects main query - Is this a feature or a bug?
                            
                                multi-column index for string match + string similarity with pg_trgm?
                            
                                Postgres Insert Into View Rule with Returning Clause
                            
                                Can Sqlalchemy work well with multiple attached SQLite database files?
                            
                                Postgres GROUP BY integer arrays
                            
                                Scripting PostGIS setup on Amazon RDS Postgres
                            
                                Liquibase create indexes with functions
                            
                                django migrate has error: Specify a USING expression to perform the conversion
                            
                                Schema only restore to new postgreSQL DB with a text format dump
                            
                                PostgreSQL 9.3 : Get day of the week
                            
                                SQL function very slow compared to query without function wrapper
                            
                                How to work with postgres exclusion constraints in alembic
                            
                                Can't import to heroku postgres database from dump
                            
                                How to configure Slick 3.0.0 for Postgres DB (either with Hikari or without) Typesafe Play conf
                            
                                Django model u'id' clashes when using OneToOneField
                            
                                Appending (pushing) and removing from a JSON array in PostgreSQL 9.2, 9.3, and 9.4?
                            
                                PostgreSQL Parameterized Insert with ADO.NET

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With