Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PostgreSQL 9.1 installation and database encoding

I just did my first install of PostgreSQL 9.1 on Ubuntu 10.04.

note: I have done it a few times on Windows with an installer without issues.

After a bit of effort, I got it set up to connect remotely via pgAdminIII. However, I was really surprised after connecting to the db, that I got a warning about the encoding. The "postgres" database itself was created with "SQL_ASCII" encoding. Every time that I've installed on windows, it created the postgres DB with "UTF8" - which seems like it would be a lot better and would stop the warning message when opening up the database via pgAdminIII.

Is there something I did wrong? Is there an installation option/param to use to set the default encoding to use?

And is there anyway to fix this? I've read some things on the web that says that you need to dumb and restore to change the encoding of a database, but I'm not sure this is even possible on the postgres db. Is it?

Thanks for your help!

like image 785
David S Avatar asked Dec 06 '11 02:12

David S


People also ask

What is the default encoding for PostgreSQL?

ScaleGrid PostgreSQL deployments use UTF-8 as the default encoding on both client and server side. The template1 database is UTF-8 encoded, and uses the en_US. UTF-8 locale. By default any databases you create will also inherit this encoding.

Does PostgreSQL support UTF 16?

Short answer, this is not directly possible as PostgreSQL only supports a UTF-8 character set. UTF-16 based formats like Java, JavaScript, Windows can contain half surrogate pairs which have no representation in UTF-8 or UTF-32. These may easily be created by sub-stringing a Java, JavaScript, VB.Net string.

Does PostgreSQL support UTF-8?

The character set support in PostgreSQL allows you to store text in a variety of character sets (also called encodings), including single-byte character sets such as the ISO 8859 series and multiple-byte character sets such as EUC (Extended Unix Code), UTF-8, and Mule internal code.


3 Answers

The relevant option is --locale=locale to the initdb command which initializes your database cluster. If you don't supply it explicitly it defaults to the system locale. (You probably run your Ubuntu on locale 'C'.)

Read more about it in the excellent manual here.

In PostgreSQL you can still sneak in a database with different locale by basing a new database off template0 instead of the default templeate1. I quote the manual here:

The encoding and locale settings must match those of the template database, except when template0 is used as template.

But I'd rather recreate the database cluster with the desired locale. Much cleaner.


Edit: info about available locales

You can only use locales that are provided by the operating system. I quote the manual here:

What locales are available on your system under what names depends on what was provided by the operating system vendor and what was installed. On most Unix systems, the command locale -a will provide a list of available locales. Windows uses more verbose locale names, such as German_Germany or Swedish_Sweden.1252, but the principles are the same.

Look at locale-gen in a Unix-system, if you want to use a locale that has not yet been generated. The important thing to understand is that multiple locales can be installed in your OS, but only one of them can be picked for system parameters like LC_CTYPE, LC_COLLATE, etc. Look at the output of locale versus locale -a in the shell. Usually it is the same for all, set via LC_ALL.

@David: What you did may have solved your problem, but you could have had that easier. Also, be aware that the environment variable LANG only provides the default for all locale settings. If any of them is set to something different, LANG will be overridden. Set LC_ALL to override any existing setting. Here is one of many sites in the web telling you more about that.

To check all current locale settings of your database (cluster), run in your database:

SHOW ALL;

Or more specifically:

SELECT *
FROM   pg_settings
WHERE  name ~~ 'lc%';
like image 145
Erwin Brandstetter Avatar answered Oct 20 '22 17:10

Erwin Brandstetter


The answer provided by Erwin Brandstetter, was helpful, but for whatever reason, it didn't work for me. The reason is that I couldn't initdb to ever run. I kept getting a "bash: command not found" error when trying to run it with the locale. What I ended up doing was:

changing the locale of the OS. For me, this was:

$ update-locale LANG=en_US.UTF-8

Note: then I had to reboot the server. To confirm that it worked, just run:

$ locale

with that set, I stopped and dropped the cluster:

$ su postgres
$ pg_dropcluster --stop 9.1 main

note: main is the default cluster that got created for me (your cluster name might be different)

$ pg_createcluster --start 9.1 main

Again, I just recreated the cluster with the same name (main).

Note: because I picked the same cluster name (main), I had to go back and update my .conf files. For me, this specifically was postgres.conf and pg_hba.conf to re-enable remote access to the box. I'm not going to include how to do that here as there is lots of documentation on the web for this. But, if someone wants to edit this answer later to include it, that would be fine! :)

like image 28
David S Avatar answered Oct 20 '22 18:10

David S


Creating the database via initdb with the encoding option (-E or --encoding=) did the trick for me on Mac Os X Snow Leopard:

initdb -D <database_directory> -E UTF8

If you init the database this way it will create the database templates with the correct encoding also.

Refer to the initdb documentation here and the encoding documentation here to choose the correct encoding.

like image 37
Pedro Carriço Avatar answered Oct 20 '22 19:10

Pedro Carriço