A) What is the best solution for regularly backing up large PostgreSQL database (version 8.3 running on latest Ubuntu server); please don't say pg_dump with those painfully slow insert statements
B) What is the best solution for PostgreSQL database replication that works in real world
I think there is only one answer to that one.
PITR, or point in time recovery. It is basically archiving of transaction logs, and is as far as I know, the best way to do backups.
I have set it up a couple of times for 8.1, but it should be the same in 8.3.
In the postgresql.conf all you need to do is to add this:
archive_command = 'test ! -f /path/to/your/backups/archive_logs/%f && cp -i %p /path/to/your/backups/archive_logs/%f </dev/null'
This command copies the archive logs to the specified directory, where you safely can back it up with the backup software of your choice.
To make a full backup, you need to first tell PostgreSQL that you're taking a backup.
It is being done through the psql command psql "SELECT pg_start_backup('my_backup');"
After that just copy the data dir with rsync, cpio or some other tool. If the database is heavily used, the files will change during the copy, so it is important that the tool can handle that correctly and not bail out.
After the copy is finished, just run psql "SELECT pg_stop_backup();"
to tell PostgreSQL to stop it again. What those commands do is putting a marker in the Archive logs where the backup started, so in a restore, it knows from where it needs to start reading from in there.
This technique can also be used to have a warm standby for replication, but it will not be readable, just ready to take over in case of emergency. Full hot standby is planned in I think version 8.4, so until then I don't think there is another option.
One thing that is great if you use PITR, is that you can specify a timestamp to when you want the archive logs to be appended. So it can also save the database from accidents (like removing or changing some data)
A. pg_dump doesn't use insert statements by default. It will use the COPY command by default. The command line switch of -d or --inserts will cause pg_dump to put insert statements in the export. If you have either of these switches in your pg_dump command, just remove them to have pg_dump use COPY.
B. In the upcoming version of Postgres, they are going to have simple replication out of the box. I think the 8.4 release is planned soon. So, it might be worth wating for that, if possible.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With