Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

350GB SVN repo creates atleast 1MB revision for even a simplest task like branch/tag

This all started when I noticed that my repository size is increasing at a daily rate of 1GB. I did a simple test. Created a branch/tag of an existing folder that had a size of 35KB. I took note of revision number and went to $REPO/db/revs/<K-rev>/rev-number/ and checked the size of the revision. It was 1 mega byte. That sounds fishy. Any ideas on what might be wrong here. My repo is about 350GB in size with about 600,000 revisions.

P.S. I have already started a rebuild of the whole repository to see if that makes any difference but it will probably take days to complete.

like image 935
F Yaqoob Avatar asked Oct 11 '10 22:10

F Yaqoob


People also ask

What is SVN repository used for?

A Subversion repository — abbreviated SVN repository — is a database filled with your code, files, and other project assets. A SVN repository maintains a complete history of every change ever made.

How do I tag a branch in SVN?

Select the folder in your working copy which you want to copy to a branch or tag, then select the command TortoiseSVN → Branch/Tag.... If you can't remember the naming convention you used last time, click the button on the right to open the repository browser so you can view the existing repository structure.

What is branch trunk tag in SVN?

A tag is just a marker. Trunk would be the main body of development, originating from the start of the project until the present. Branch will be a copy of code derived from a certain point in the trunk that is used for applying major changes to the code while preserving the integrity of the code in the trunk.


1 Answers

Posted same question to [email protected] and got this answer from B Smith-Mannschott - which explains everything. I do have a directory in the path that contains 16000 folders - for every commit. Thank you B Smith-Mannschott for the detailed response. Posting reply here for others' benefit.


Does your repository contain a directory with very many entries? Are the changes that produce the large commits being made in or below such a directory?

Let's assume to commit a single change to a single file to your repository. Let's further assume the file is located here, in your repository:

/project/trunk/some-really-large-directory/notes/blah.txt

When you commit the change to blah.txt, the new revision will rewrite the directory nodes between 'blah.txt' and the root of the repository: /project/trunk/some-really-large-directory/notes, /project/trunk/some-really-large-directory, /project/trunk, /project, /. When rewriting a directory node, FSFS always stores the new version in its entirety. (This is different from the way changes to files are stored, which are generally as differences to some previous version of the same file.)

If /project/trunk/some-really-large-directory/ contains, say 10000 files, then each commit to blah.txt will store a full copy of this directory (with its 10'000 names) in your repository.

I noticed this when I started keeping a personal wiki under version control a few years ago. It was a flat directory of over 10'000 text files. I quickly noticed that commits were pretty big. (I've since switched to git for that task, for this and other reasons.)

see also http://svn.apache.org/repos/asf/subversion/trunk/notes/subversion-design.html#server.fs.struct.bubble-up

like image 53
F Yaqoob Avatar answered Sep 28 '22 02:09

F Yaqoob