Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Backing up a DB with Git - a good idea?

The way I see it dumping a PostgeSQL DB into one big SQL file and then committing and pushing to a remote Git repo can be a terrific backup solution: I get a history of all versions, hashing, secure transport, one-way (really hard to mess up and delete data by pushing), efficient storage (assuming no binaries) and no chance of a new image corrupting the backup (which is the risk with rsync).

Has anybody used this approach, especially with pg, and can share his/her experience? Pitfalls?

like image 267
Assaf Lavie Avatar asked Apr 27 '11 05:04

Assaf Lavie


2 Answers

Here is the full script details on how to do this for postgres.

Create a Backup User

The scripts presume the existence of a user called 'backup' that has access to either all (superuser) or the specific database. The credentials are stored in the .pgpass file in the home directory. That file looks like this (presuming the password is 'secret').

~/.pgpass

*:*:*:backup:secret

Make sure you set the correct security on .pgpass or it will be ignored

chmod 0600 ~/.pgpass

Backup a Single Database

This dumps a specific database.

backup.sh

pg_dump dbname -U backup > backup.sql
git add .
git commit -m "backup"
git push origin master

Note: you probably don't want to use any file splitting options for the DB dump since any insertion/deletion will cause a 'domino' effect and change all files creating more deltas/changes in git.

Backup all Databases on this machine

This script will dump the entire database cluster (all databases):

pg_dumpall -U backup > backup.sql
git add .
git commit -m "backup"
git push origin master

Note: you probably don't want to use any file splitting options for the DB dump since any insertion/deletion will cause a 'domino' effect and change all files creating more deltas/changes in git.

Schedule it to Run

The final step is to add this to a cron job. So, 'crontab -e' and then add something like the following (runs every day at midnight)

# m h  dom mon dow   command
# run postgres backup to git
0 0 * * * /home/ubuntu/backupdbtogit/backup.sh

Restore

If you need to restore the database, you'll checkout the version you want to restore then pass to pg. (more details on that here http://www.postgresql.org/docs/8.1/static/backup.html#BACKUP-DUMP-RESTORE )

for a single database:

psql dbname < infile    

for the entire cluster

psql -f infile postgres

None of this was particularly complicated, but it's always tedious looking up all the parts.


Crashing on Server with limited RAM

I experienced an issue with git failing on a push. This was due to git using a lot of memory - several commits had backed up. I resolved the failure by mounting the server git repo on my local machine (which has plenty of RAM). I mounted the server disk using sshfs and then committed from my workstation machine. After I did this, the low memory server resumed commits without a problem.

A better alternative is to limit the memory usage of git during the pack (from Is there a way to limit the amount of memory that "git gc" uses? ).

git config --global pack.windowMemory "100m"
git config --global pack.packSizeLimit "100m"
git config --global pack.threads "1"

Note: I have not tried setting a memory limit yet, since I have not had the push failure problem again.

like image 141
Tom Carchrae Avatar answered Oct 15 '22 19:10

Tom Carchrae


Generally, you ought to use a backup tool for doing backups, and a version control tool to do version control. They are similar, but not the same.

Some people mix the two, where for example essentially whatever is in the database is the version, and that doesn't have to be wrong, but be clear about what you want.

If you're talking about just the schema, then you probably can't do much wrong with "backups" using Git. But if you want to back up the data, then things can get complicated. Git isn't very good with large files. You could use something like git-annex to address that, but then you need a separate backup mechanism to create the external files. Also, using "proper" backup methods such as pg_dump or WAL archiving give other advantages, such as being able to restore subsets of databases or doing point-in-time recovery.

You probably also want to back up other parts of an operating system. How do you do that? Preferrably not with a version control system, because they don't preserve file permissions, timestamps, and special files so well. So it would make some sense to tie your database backup into your existing backup system.

like image 33
Peter Eisentraut Avatar answered Oct 15 '22 20:10

Peter Eisentraut