Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I backup a 13 GB SVN repository? The dump is 100+ GB

I have about a dozen repositories that on the file system are 1 GB to 10 GB in size, and I need to set up automated backups for all of them (our old backup scripts got lost when a computer went down) with our XP 64-bit machines.

After reading this question about the best way to back up SVN repos, I started dumping the biggest repo we have, which is about 13 GB. This command has been executing for ~2.5 hours now, and it's currently dumping revision ~200 of 300+.

svnadmin --deltas \\path\to\repo\folder > \\path\to\backup\folder\dump.svn

The dump file is over 100 GB and counting. I know I can 7-zip this sucker, but 100 GB?! ... o_O

The repositories contain a large amount of binary data, which could be part of the problem, but as of right now, switching to a more efficient version control system (assuming there is one) is not realistic; SVN is a part of life here.

I've considered using hotcopy, which takes up a lot less space, but I tried using one of our old hotcopy-ed backups, and subversion 1.7 couldn't find a bunch of files it needed. It seems that I'd have to install the version of SVN that originally hotcopy-ed the repo, and dump that repo to get it into a newer SVN. This statement seems to verify the problem I'm having with hotcopy: http://svn.haxx.se/users/archive-2005-05/0842.shtml

I feel like I've just got to be missing something. Maybe there's some flag for dump that magically makes the dump 1/5 the size...

Do I have any other options?


UPDATE: The last revision, #327, was just dumped. The final size of the dump file is 127 GB. That's from a 13.5 GB repo. I have probably roughly 3X that much in all of my repositories combined.

like image 696
Logical Fallacy Avatar asked Sep 19 '12 20:09

Logical Fallacy


People also ask

What does SVN dump do?

Description. Dump the contents of the filesystem to stdout in a “dump file” portable format, sending feedback to stderr . Dump revisions LOWER rev through UPPER rev. If no revisions are given, dump all revision trees.


3 Answers

For daily backup I would say you really don't need to do an svnadmin dump. I would use the dump method if you were about to transfer the repository to a new server which may be running a different SVN version and OS as it's the most portable way of dumping the repository, but it's not very space-efficient.

I'd recommend using the hotcopy methods referred to that link. That will guarantee that the state of the filesystem is consistent, and will also copy the configuration files and hook scripts (incidentally the svnadmin dump doesn't copy these, so you'll end up with an incomplete backup). Because it's just a direct copy of the repository, it's the same size so the backup should be much more manageable.

In an emergency, if you need to restore a backup done from a hotcopy then all you should need are a machine with the same major version of SVN (e.g. 1.6 or 1.7) and to be safe, the same OS. You should be then able to use this repository directly, or you can do an svnadmin dump at this point to transfer to a new server.

EDIT: comparison of svnsync and hotcopy:

common aspects:

  • Safely deals with repository writes during backup
  • Size of backup = size of repository

Advantages of hotcopy:

  • Easier to set up
  • Backs up hooks and config files

Advantages of svnsync:

  • Allows backup onto a different machine
  • Only new revisions since last sync are written so the sync is very quick and this means that you can do very compact incremental backups
like image 179
the_mandrill Avatar answered Sep 20 '22 23:09

the_mandrill


Thanks to the suggestions of bahrep and the_mandrill, I decided to go with svnsync for these repositories. I was able to get it set up quite easily, and since we don't have any hooks or config files, there's nothing else to back up. Because of the problems I had with hotcopy (thanks to the_mandrill for proposing a solution to these issues) I decided that svnsync would be the simpler solution for us.

In addition to what the_mandrill pointed out, svnsync has other advantages:

  • In the event that the main repository goes down, users can download from the backup repository as long as they have the link.
  • The backups are fully-version controlled. My boss asked me to do nightly backups, but keep only those backups that are one week old. To do that with hotcopy, I'd have to write a script. With svnsync, I don't have to worry about any of that.

To set up svnsync, I had to complete the following steps. Excuse any typos. All of our repositories are hosted using VisualSVN Server.

  1. Create a new, empty repository:

    svnadmin create \\computerB\C$\repositories\mirror

  2. Create the file, \mirror\hooks\pre-revprop-change.bat. It's only content is this one line:

    exit 0

  3. Initialize the sync

    svnsync init https://computerB.domain.net/svn/mirror https://computerA.domain.net/svn/repo

  4. Synchronize the two repos

    svnsync synchronize https://computerB.domain.net/svn/mirror https://computerA.domain.net/svn/repo

like image 42
Logical Fallacy Avatar answered Sep 18 '22 23:09

Logical Fallacy


Beginning with VisualSVN Server 3.6, you can use Backup-SvnRepository PowerShell cmdlet to make a backup of Subversion repository. To restore the repository from backup, use Restore-SvnRepository cmdlet.

What is more, the Enterprise Edition of the server offers a scheduled backup feature. The built-in scheduled backup supports several backup types including incremental backups that are efficient in terms of storage space and time required to take the backup.

like image 40
bahrep Avatar answered Sep 21 '22 23:09

bahrep