Distributed version control for HUGE projects - is it feasible?

Question

We're pretty happy with SVN right now, but Joel's tutorial intrigued me. So I was wondering - would it be feasible in our situation too?

The thing is - our SVN repository is HUGE. The software itself has a 15 years old legacy and has survived several different source control systems already. There are over 68,000 revisions (changesets), the source itself takes up over 100MB and I cant even begin to guess how many GB the whole repository consumes.

The problem then is simple - a clone of the whole repository would probably take ages to make, and would consume far more space on the drive that is remotely sane. And since the very point of distributed version control is to have a as many repositories as needed, I'm starting to get doubts.

How does Mercurial (or any other distributed version control) deal with this? Or are they unusable for such huge projects?

Added: To clarify - the whole thing is one monolithic beast of a project which compiles to a single .EXE and cannot be split up.

Added 2: Second thought - The Linux kernel repository uses git and is probably an order of magnitude or two bigger than mine. So how do they make it work?

gavinb · Accepted Answer

Distributed version control for HUGE projects - is it feasible?

Absolutely! As you know, Linux is massive and uses Git. Mercurial is used for some major projects too, such as Python, Mozilla, OpenSolaris and Java.

We're pretty happy with SVN right now, but Joel's tutorial intrigued me. So I was wondering - would it be feasible in our situation too?

Yes. And if you're happy with Subversion now, you're probably not doing much branching and merging!

The thing is - our SVN repository is HUGE. [...] There are over 68,000 revisions (changesets), the source itself takes up over 100MB

As others have pointed out, that's actually not so big compared to many existing projects.

The problem then is simple - a clone of the whole repository would probably take ages to make, and would consume far more space on the drive that is remotely sane.

Both Git and Mercurial are very efficient at managing the storage, and their repositories take up far less space than the equivalent Subversion repo (having converted a few). And once you have an initial checkout, you're only pushing deltas around, which is very fast. They are both significantly faster in most operations. The initial clone is a one-time cost, so it doesn't really matter how long it takes (and I bet you'd be surprised!).

And since the very point of distributed version control is to have a as many repositories as needed, I'm starting to get doubts.

Disk space is cheap. Developer productivity matters far more. So what if the repo takes up 1GB? If you can work smarter, it's worth it.

How does Mercurial (or any other distributed version control) deal with this? Or are they unusable for such huge projects?

It is probably worth reading up on how projects using Mercurial such as Mozilla managed the conversion process. Most of these have multiple repos, which each contain major components. Mercurial and Git both have support for nested repositories too. And there are tools to manage the conversion process - Mercurial has built-in support for importing from most other systems.

Added: To clarify - the whole thing is one monolithic beast of a project which compiles to a single .EXE and cannot be split up.

That makes it easier, as you only need the one repository.

Added 2: Second thought - The Linux kernel repository uses git and is probably an order of magnitude or two bigger than mine. So how do they make it work?

Git is designed for raw speed. The on-disk format, the wire protocol, the in-memory algorithms are all heavily optimized. And they have developed sophisticated workflows, where patches flow from individual developers, up to subsystem maintainers, up to lieutenants, and eventually up to Linus. One of the best things about DVCS is that they are so flexible they enable all sorts of workflows.

I suggest you read the excellent book on Mercurial by Bryan O'Sullivan, which will get you up to speed fast. Download Mercurial and work through the examples, and play with it in some scratch repos to get a feel for it.

Then fire up the convert command to import your existing source repository. Then try making some local changes, commits, branches, view logs, use the built-in web server, and so on. Then clone it to another box and push around some changes. Time the most common operations, and see how it compares. You can do a complete evaluation at no cost but some of your time.

Tadeusz A. Kadłubowski · Answer

100MB of source code is less than the Linux kernel. Changelog between Linux kernel 2.6.33 and 2.6.34-rc1 has 6604 commits. Your repository scale doesn't sound intimidating to me.

Linux kernel 2.6.34-rc1 uncompressed from .tar.bz2 archive: 445MB
Linux kernel 2.6 head checked out from main Linus tree: 827MB

Twice as much, but still peanuts with the big hard drives we all have.

Distributed version control for HUGE projects - is it feasible?

Tags:

dvcs

svn

scalability

mercurial

Vilx-

2 Answers

gavinb

Tadeusz A. Kadłubowski

Recent Activity

Donate For Us

Distributed version control for HUGE projects - is it feasible?

Tags:

dvcs

svn

scalability

mercurial

Vilx-

2 Answers

gavinb

Tadeusz A. Kadłubowski

Related questions

Recent Activity

Donate For Us