Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Comparison between Centralized and Distributed Version Control Systems [closed]

From my answer to a different question:

Distributed version control systems (DVCSs) solve different problems than Centralized VCSs. Comparing them is like comparing hammers and screwdrivers.

Centralized VCS systems are designed with the intent that there is One True Source that is Blessed, and therefore Good. All developers work (checkout) from that source, and then add (commit) their changes, which then become similarly Blessed. The only real difference between CVS, Subversion, ClearCase, Perforce, VisualSourceSafe and all the other CVCSes is in the workflow, performance, and integration that each product offers.

Distributed VCS systems are designed with the intent that one repository is as good as any other, and that merges from one repository to another are just another form of communication. Any semantic value as to which repository should be trusted is imposed from the outside by process, not by the software itself.

The real choice between using one type or the other is organizational -- if your project or organization wants centralized control, then a DVCS is a non-starter. If your developers are expected to work all over the country/world, without secure broadband connections to a central repository, then DVCS is probably your salvation. If you need both, you're fsck'd.


To those who think distributed systems don't allow authoritative copies please note that there are plenty of places where distributed systems have authoritative copies, the perfect example is probably Linus' kernel tree. Sure lots of people have their own trees but almost all of them flow toward Linus' tree.

That said I use to think that distributed SCM's were only useful for lots of developers doing different things but recently have decided that anything a centralized repository can do a distributed one can do better.

For example, say you are a solo developer working on your own personal project. A centralized repository might be an obvious choice but consider this scenario. You are away from network access (on a plane, at a park, etc) and want to work on your project. You have your local copy so you can do work fine but you really want to commit because you have finished one feature and want to move on to another, or you found a bug to fix or whatever. The point is that with a centralized repo you end up either mashing all the changes together and commiting them in a non-logical changeset or you manually split them out later.

With a distributed repo you go on business as usual, commit, move on, when you have net access again you push to your "one true repo" and nothing changed.

Not to mention the other nice thing about distributed repos: full history available always. You need to look at the revision logs when away from the net? You need to annotate the source to see how a bug was introduced? All possible with distributed repos.

Please please don't believe that distributed vs centralized is about ownership or authoritative copies or anything like that. The reality is distributed is the next step in evolution of SCM's.


Not really a comparison, but here are what big projects are using:

Centralized VCSes

  • Subversion

    Apache, GCC, Ruby, MPlayer, Zope, Plone, Xiph, FreeBSD, WebKit, ...

  • CVS

    CVS

Distributed VCSes

  • git

    Linux kernel, KDE, Perl, Ruby on Rails, Android, Wine, Fedora, X.org, Mediawiki, Django, VLC, Mono, Gnome, Samba, CUPS, GnuPG, Emacs ELPA...

  • mercurial (hg)

    Mozilla and Mozdev, OpenJDK (Java), OpenSolaris, ALSA, NTFS-3G, Dovecot, MoinMoin, mutt, PETSc, Octave, FEniCS, Aptitude, Python, XEmacs, Xen, Vim, Xine...

  • bzr

    Emacs, Apt, Mailman, MySQL, Squid, ... also promoted within Ubuntu.

  • darcs

    ghc, ion, xmonad, ... popular within Haskell community.

  • fossil

    SQLite


W. Craig Trader said this about DVCS and CVCS:

If you need both, you're fsck'd.

I wouldn't say you're fsck'd when using both. Practically developers who use DVCS tools usually try to merge their changes (or send pull requests) against a central location (usually to a release branch in a release repository). There is some irony with developers who use DVCS but in the end stick with a centralized workflow, you can start to wonder if the Distributed approach really is better than Centralized.

There are some advantages with DVCS over a CVCS:

  • The notion of uniquely recognizable commits makes sending patches between peers painless. I.e. you make the patch as a commit, and share it with others developers who need it. Later when everyone wants to merge together, that particular commit is recognized and can be compared between branches, having less chance of merge conflict. Developers tend to send patches to each other by USB stick or e-mail regardless of versioning tool you use. Unfortunately in the CVCS case, version control will register the commits as seperate, failing to recognize that the changes are the same, leading to a higher chance of merge conflict.

  • You can have local experimental branches (cloned repositories can also be considered a branch) that you don't need to show to others. That means, breaking changes don't need to affect developers if you haven't pushed anything upstream. In a CVCS, when you still have a breaking change, you may have to work offline until you've fixed it and commit the changes by then. This approach effectively defeats the purpose of using versioning as a safety net but it is a necessary evil in CVCS.

  • In today's world, companies usually work with off-shore developers (or if even better they want to work from home). Having a DVCS helps these kind of projects out because it eliminates the need of a reliable network connection since everyone has their own repo.

…and some disadvantages that usually have workarounds:

  • Who has the latest revision? In a CVCS, the trunk usually has the latest revision, but in a DVCS it may not be plainly obvious. The workaround is using rules of conduct, that the developers in a project have to come to an agreement in which repo to merge their work against.

  • Pessimistic locks, i.e. a file is locked when making a check-out, are usually not possible because of concurrency that may happen between repositories in DVCS. The reason file locking exists in version control is because developers want to avoid merge conflicts. However, locking has the disadvantage of slowing development down as two developers can't work on same piece of code simultaneously as with a long transaction model and it isn't full proof warranty against merge conflicts. The only sane ways regardless of version control is to combat big merge conflicts is to have good code architecture (like low coupling high cohesion) and divide up your work tasks so that they have low impact on the code (which is easier said than done).

  • In proprietary projects it would be disastrous if the whole repository becomes publically available. Even more so if a disgruntled or malicious programmer gets hold of a cloned repository. Source code leakage is a severe pain for proprietary businesses. DVCS's makes this plain simple as you only need to clone the repository, while some CM systems (such as ClearCase) tries to restrict that access. However in my opinion, if you have an enough amount of dysfunctionality in your company culture then no version control in the world will help you against source code leakage.


During my search for the right SCM, I found the following links to be of great help:

  1. Better SCM Initiative : Comparison. Comparison of about 26 version control systems.
  2. Comparison of revision control software. Wikipedia article comparing about 38 version control systems covering topics like technical differences, features, user interfaces, and more.
  3. Distributed version control systems. Another comparison, but focussed mainly on distributed systems.