Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do you change the character encoding of a postgres database?

I have a database that was set up with the default character set SQL_ASCII. I want to switch it to UNICODE. Is there an easy way to do that?

like image 742
Nick Retallack Avatar asked Feb 23 '11 12:02

Nick Retallack


People also ask

How do you change character type in PostgreSQL?

First, specify the name of the table to which the column you want to change belongs in the ALTER TABLE clause. Second, give the name of column whose data type will be changed in the ALTER COLUMN clause. Third, provide the new data type for the column after the TYPE keyword.

What is default encoding for Postgres?

ScaleGrid PostgreSQL deployments use UTF-8 as the default encoding on both client and server side. The template1 database is UTF-8 encoded, and uses the en_US. UTF-8 locale. By default any databases you create will also inherit this encoding.

How do you handle special characters in PostgreSQL?

Special character symbols are characters with a pre-defined syntactic meaning in PostgreSQL. They are typically disallowed from being used in identifier names for this reason, though as mentioned in the section on quoted identifiers, this restriction can usually be worked around with quotes if need be.

Does PostgreSQL support UTF 16?

Short answer, this is not directly possible as PostgreSQL only supports a UTF-8 character set. UTF-16 based formats like Java, JavaScript, Windows can contain half surrogate pairs which have no representation in UTF-8 or UTF-32. These may easily be created by sub-stringing a Java, JavaScript, VB.Net string.


1 Answers

First off, Daniel's answer is the correct, safe option.

For the specific case of changing from SQL_ASCII to something else, you can cheat and simply poke the pg_database catalogue to reassign the database encoding. This assumes you've already stored any non-ASCII characters in the expected encoding (or that you simply haven't used any non-ASCII characters).

Then you can do:

update pg_database set encoding = pg_char_to_encoding('UTF8') where datname = 'thedb' 

This will not change the collation of the database, just how the encoded bytes are converted into characters (so now length('£123') will return 4 instead of 5). If the database uses 'C' collation, there should be no change to ordering for ASCII strings. You'll likely need to rebuild any indices containing non-ASCII characters though.

Caveat emptor. Dumping and reloading provides a way to check your database content is actually in the encoding you expect, and this doesn't. And if it turns out you did have some wrongly-encoded data in the database, rescuing is going to be difficult. So if you possibly can, dump and reinitialise.

like image 193
araqnid Avatar answered Sep 24 '22 04:09

araqnid