Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can I use Subversion for a multi gigabyte data set?

The data set is 97984 files in 6766 folders with 2,57 GB. A lot of them are binary files.

For me this does not sound so much. The daily data change rate is in the hundreds of KB on maybe 50 files. But I'm scared that subversion will become extremely slow.

It was never fast anyway and the last time at v1.2 the recommendation was splitting it into multiple repositories. No, I don't like this.

Is there way that I can tell Subversion or any other free open source version control to trust the file modified time/file size to detect file changes and not compare all the files? With this and putting the data on a fast modern SSD it should run fast, say, less then 6 seconds for a complete commit (that's 3x more then getting the summary from the Windows Explorer properties dialog).

like image 822
Lothar Avatar asked Jan 24 '23 05:01

Lothar


1 Answers

I've just done a benchmark on my machine to see what this is like:

Data size - 2.3Gb (84000 files in 6000 directories, random textual data)
Checkout time 14m
Changed 500 files (14M of data changes)
Commit time 50seconds

To get an idea of how long it would take to manually compare all those files, I also ran a diff against 2 exports of that data (version1 against version2).

Diff time: 55m

I'm not sure if an ssd would get that commit time down as much as you hope, but I was using a normal single sata disk to get both the 50 seconds and 55minutes comparisons.

To me, these times strongly suggest that the contents of the files are not being checked by svn by default.

This was with svn 1.6.

like image 67
Jim T Avatar answered Jan 30 '23 09:01

Jim T