The way I see it dumping a PostgeSQL DB into one big SQL file and then committing and pushing to a remote Git repo can be a terrific backup solution: I get a history of all versions, hashing, secure transport, one-way (really hard to mess up and delete data by pushing), efficient storage (assuming no binaries) and no chance of a new image corrupting the backup (which is the risk with rsync).
Has anybody used this approach, especially with pg, and can share his/her experience? Pitfalls?
Here is the full script details on how to do this for postgres.
The scripts presume the existence of a user called 'backup' that has access to either all (superuser) or the specific database. The credentials are stored in the .pgpass file in the home directory. That file looks like this (presuming the password is 'secret').
~/.pgpass
*:*:*:backup:secret
Make sure you set the correct security on .pgpass or it will be ignored
chmod 0600 ~/.pgpass
This dumps a specific database.
backup.sh
pg_dump dbname -U backup > backup.sql
git add .
git commit -m "backup"
git push origin master
Note: you probably don't want to use any file splitting options for the DB dump since any insertion/deletion will cause a 'domino' effect and change all files creating more deltas/changes in git.
This script will dump the entire database cluster (all databases):
pg_dumpall -U backup > backup.sql
git add .
git commit -m "backup"
git push origin master
Note: you probably don't want to use any file splitting options for the DB dump since any insertion/deletion will cause a 'domino' effect and change all files creating more deltas/changes in git.
The final step is to add this to a cron job. So, 'crontab -e' and then add something like the following (runs every day at midnight)
# m h dom mon dow command
# run postgres backup to git
0 0 * * * /home/ubuntu/backupdbtogit/backup.sh
If you need to restore the database, you'll checkout the version you want to restore then pass to pg. (more details on that here http://www.postgresql.org/docs/8.1/static/backup.html#BACKUP-DUMP-RESTORE )
for a single database:
psql dbname < infile
for the entire cluster
psql -f infile postgres
None of this was particularly complicated, but it's always tedious looking up all the parts.
I experienced an issue with git failing on a push. This was due to git using a lot of memory - several commits had backed up. I resolved the failure by mounting the server git repo on my local machine (which has plenty of RAM). I mounted the server disk using sshfs and then committed from my workstation machine. After I did this, the low memory server resumed commits without a problem.
A better alternative is to limit the memory usage of git during the pack (from Is there a way to limit the amount of memory that "git gc" uses? ).
git config --global pack.windowMemory "100m"
git config --global pack.packSizeLimit "100m"
git config --global pack.threads "1"
Note: I have not tried setting a memory limit yet, since I have not had the push failure problem again.
Generally, you ought to use a backup tool for doing backups, and a version control tool to do version control. They are similar, but not the same.
Some people mix the two, where for example essentially whatever is in the database is the version, and that doesn't have to be wrong, but be clear about what you want.
If you're talking about just the schema, then you probably can't do much wrong with "backups" using Git. But if you want to back up the data, then things can get complicated. Git isn't very good with large files. You could use something like git-annex to address that, but then you need a separate backup mechanism to create the external files. Also, using "proper" backup methods such as pg_dump or WAL archiving give other advantages, such as being able to restore subsets of databases or doing point-in-time recovery.
You probably also want to back up other parts of an operating system. How do you do that? Preferrably not with a version control system, because they don't preserve file permissions, timestamps, and special files so well. So it would make some sense to tie your database backup into your existing backup system.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With