I just did my first install of PostgreSQL 9.1 on Ubuntu 10.04.
note: I have done it a few times on Windows with an installer without issues.
After a bit of effort, I got it set up to connect remotely via pgAdminIII. However, I was really surprised after connecting to the db, that I got a warning about the encoding. The "postgres" database itself was created with "SQL_ASCII" encoding. Every time that I've installed on windows, it created the postgres DB with "UTF8" - which seems like it would be a lot better and would stop the warning message when opening up the database via pgAdminIII.
Is there something I did wrong? Is there an installation option/param to use to set the default encoding to use?
And is there anyway to fix this? I've read some things on the web that says that you need to dumb and restore to change the encoding of a database, but I'm not sure this is even possible on the postgres db. Is it?
Thanks for your help!
ScaleGrid PostgreSQL deployments use UTF-8 as the default encoding on both client and server side. The template1 database is UTF-8 encoded, and uses the en_US. UTF-8 locale. By default any databases you create will also inherit this encoding.
Short answer, this is not directly possible as PostgreSQL only supports a UTF-8 character set. UTF-16 based formats like Java, JavaScript, Windows can contain half surrogate pairs which have no representation in UTF-8 or UTF-32. These may easily be created by sub-stringing a Java, JavaScript, VB.Net string.
The character set support in PostgreSQL allows you to store text in a variety of character sets (also called encodings), including single-byte character sets such as the ISO 8859 series and multiple-byte character sets such as EUC (Extended Unix Code), UTF-8, and Mule internal code.
The relevant option is --locale=locale
to the initdb command which initializes your database cluster. If you don't supply it explicitly it defaults to the system locale. (You probably run your Ubuntu on locale 'C'.)
Read more about it in the excellent manual here.
In PostgreSQL you can still sneak in a database with different locale by basing a new database off template0
instead of the default templeate1
. I quote the manual here:
The encoding and locale settings must match those of the template database, except when template0 is used as template.
But I'd rather recreate the database cluster with the desired locale. Much cleaner.
You can only use locales that are provided by the operating system. I quote the manual here:
What locales are available on your system under what names depends on what was provided by the operating system vendor and what was installed. On most Unix systems, the command locale -a will provide a list of available locales. Windows uses more verbose locale names, such as German_Germany or Swedish_Sweden.1252, but the principles are the same.
Look at locale-gen
in a Unix-system, if you want to use a locale that has not yet been generated. The important thing to understand is that multiple locales can be installed in your OS, but only one of them can be picked for system parameters like LC_CTYPE
, LC_COLLATE
, etc. Look at the output of locale
versus locale -a
in the shell. Usually it is the same for all, set via LC_ALL
.
@David: What you did may have solved your problem, but you could have had that easier. Also, be aware that the environment variable LANG
only provides the default for all locale settings. If any of them is set to something different, LANG
will be overridden. Set LC_ALL
to override any existing setting. Here is one of many sites in the web telling you more about that.
To check all current locale settings of your database (cluster), run in your database:
SHOW ALL;
Or more specifically:
SELECT *
FROM pg_settings
WHERE name ~~ 'lc%';
The answer provided by Erwin Brandstetter, was helpful, but for whatever reason, it didn't work for me. The reason is that I couldn't initdb to ever run. I kept getting a "bash: command not found" error when trying to run it with the locale. What I ended up doing was:
changing the locale of the OS. For me, this was:
$ update-locale LANG=en_US.UTF-8
Note: then I had to reboot the server. To confirm that it worked, just run:
$ locale
with that set, I stopped and dropped the cluster:
$ su postgres
$ pg_dropcluster --stop 9.1 main
note: main is the default cluster that got created for me (your cluster name might be different)
$ pg_createcluster --start 9.1 main
Again, I just recreated the cluster with the same name (main).
Note: because I picked the same cluster name (main), I had to go back and update my .conf files. For me, this specifically was postgres.conf and pg_hba.conf to re-enable remote access to the box. I'm not going to include how to do that here as there is lots of documentation on the web for this. But, if someone wants to edit this answer later to include it, that would be fine! :)
Creating the database via initdb with the encoding option (-E or --encoding=) did the trick for me on Mac Os X Snow Leopard:
initdb -D <database_directory> -E UTF8
If you init the database this way it will create the database templates with the correct encoding also.
Refer to the initdb documentation here and the encoding documentation here to choose the correct encoding.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With