PostgreSQL UTF-8 binary collation

Tags:

I would like to have a collation which orders the UTF-8 encoding of 0x1234 below of 0x1235 regardless of the character mapping in the Unicode standard. MySQL uses utf8_bin for this. MSSQL apparently http://msdn.microsoft.com/en-us/library/ms143350.aspx have BIN and BIN2 collations. While finding these were easy, I can't even find a list of collations PostgreSQL supports much less answer to this specific question.

511

asked Oct 15 '11 15:10

chx

3 Answers

The C locale will do. UTF-8 is designed so that byte ordering is also codepoint ordering. This is not trivial but consider how UTF-8 works:

Number range  Byte 1   Byte 2   Byte 3
0000-007F     0xxxxxxx
0080-07FF     110xxxxx 10xxxxxx
0800-FFFF     1110xxxx 10xxxxxx 10xxxxxx

When sorting binary data aka C locale, the first non-equal byte will determine the ordering. What we neeed to see that if two numbers encoded into UTF-8 differ then the first non-equal byte will be lower for the lower value. If the numbers are in different ranges then the first byte will indeed be lower for the lower number. Within the same range, the order is determined by literally the same bits as without encoding.

answered Sep 28 '22 11:09

chx

Sort order of text depends on lc_collate (not on the system locale!). The system locale only serves as a default when creating the db cluster if you don't provide another locale.

The behaviour you are expecting only works with locale C. Read all about it in the fine manual:

The C and POSIX collations both specify "traditional C" behavior, in which only the ASCII letters "A" through "Z" are treated as letters, and sorting is done strictly by character code byte values.

Emphasis mine. PostgreSQL 9.1 has a couple of new features for collation. Might be exactly what you are looking for.

answered Sep 28 '22 09:09

Erwin Brandstetter

Postgres uses the collation defined by the system locale on cluster creation.

You might try to ORDER BY encode(column,'hex')

answered Sep 28 '22 09:09

Ramon Poca

Related questions
                            
                                Auto reconnect postgresq database
                            
                                Postgres: index on cosine similarity of float arrays for one-to-many search
                            
                                How to cast json array to text array?
                            
                                Spring JPA Sorting and Paging with PostgreSQL JSONB
                            
                                Union query distinct on one column
                            
                                Missing table name in IntegrityError (Django ORM)
                            
                                PostgreSQL query not using INDEX when RLS (Row Level Security) is enabled
                            
                                Postgres Vacuum doesnt free up space
                            
                                How to append prefix match to tsquery in PostgreSQL
                            
                                ERROR: cannot execute SELECT in a read-only transaction when connecting to DB
                            
                                Django: How to migrate from ManyToMany to ForeignKey?
                            
                                How to fix "ERROR: aggregate functions are not allowed in UPDATE"
                            
                                HikariPool-1 - Exception during pool initialization
                            
                                Postgres - Unique constraint with multiple columns and NULL values
                            
                                postgresql-client-13 : Depends: libpq5 (>= 13~beta2) but 12.3-1.pgdg18.04+1 is to be installed
                            
                                CSRF token is missing error in docker pgadmin
                            
                                run queries periodically (ala crontab) from a postgresql database
                            
                                Problem with a column name contains a colon in PostgreSQL
                            
                                Whats the standard way of getting the last insert id?
                            
                                Count on DISTINCT of several fields work only on MySQL?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

PostgreSQL UTF-8 binary collation

Tags:

postgresql

utf-8

collation

chx

People also ask

3 Answers

chx

Erwin Brandstetter

Ramon Poca

Recent Activity

Donate For Us