Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Install utf8 collation in PostgreSQL

Right now I can choose Encoding : UTF8 when creating a new DB in pgAdmin4 GUI.

But, there is no option to choose utf8_general_ci as collation or character type. When I do select * from pg_collation; I dont see any collation relevant to utf8_general_ci.

Coming from a mySQL background I am confused. Do I have to install utf8-like ( eg utf8_general_ci, utf8_unicode_ci) collation in my PostgreSQL 10 or windows10?

I just want to have the equivalent of mySQL collation utf8_general_ci to PostgreSQL.

Thank you

like image 687
slevin Avatar asked Dec 31 '17 18:12

slevin


People also ask

How do I get collation in PostgreSQL?

To find the collation of the database, you need to query pg_database : select datname, datcollate from pg_database; Here are the relevant pages of the PostgreSQL manual: http://www.postgresql.org/docs/current/static/infoschema-columns.html.

Does PostgreSQL support UTF-8?

The character set support in PostgreSQL allows you to store text in a variety of character sets (also called encodings), including single-byte character sets such as the ISO 8859 series and multiple-byte character sets such as EUC (Extended Unix Code), UTF-8, and Mule internal code.

How do I change collate and Ctype in PostgreSQL?

You cannot to change these values for already created databases. In this moment, when there are not other databases, the most easy solution is a) stop database, b) delete data directory, c) run manually initdb with options --encoding and --locale (run this command under postgres user).


1 Answers

utf8 is an encoding (how to represent unicode characters as a series of bytes), not a collation (which character goes before which).

I think the Postgres 10 collation equivalent for utf8_general_ci (or more modern utf8_unicode_ci) is called und-x-icu - this is an undefined collation (not defined for any real world language) provided by an ICU library. This collation would sort quite reasonably characters from most languages.

ICU support is a new feature added in PostgreSQL 10, so this collation isn't available for older PostgreSQL versions or when it's disabled during compilation. Before that Postgres was using operating system provided collation support, which differs between operating systems.

like image 104
Tometzky Avatar answered Sep 21 '22 01:09

Tometzky