I can see why distributed source control systems (DVCS - like Mercurial) make sense for open source projects. But do they make sense for an enterprise? (over a centralized Source Control System such as TFS) What features of a DVCS make it better or worse suited for an enterprise with many developers? (over a centralized system)

I have just introduced a DVCS (Git in this case) in a large banking company, where Perforce, SVN or ClearCase was the centralized VCS of choices: I already knew of the challenges (see my previous answer "Can we finally move to DVCS in Corporate Software? Is SVN still a 'must have' for development?") I have been challenged on three fronts: <ul> <li> centralization: while the decentralized model has its merits (and allows for private commits or working without the network while having access to the full history), there still needs to be a clear set of centralized repos, acting as the main reference for all developers. </li> <li> authentication: a DVCS allows you to "sign-off" (commit) your code as... pretty much anyone (author "<code>foo</code>", email "<code>foo@bar.com</code>"). You can do a <code>git config user.name foo</code>, or <code>git config user.name whateverNameIFeelToHave</code>, and have all your commits with bogus names in it. That doesn't mix well with the unique centralized "Active Directory" user referential used by big enterprises. </li> <li> authorization: by default, you can clone, push from or pull to any repository, and modify any branch, or any directory. For sensitive projects, that can be a blocking issue (the banking world is usually very protective of some pricing or quants algorithms, which require strict read/write access for a very limited number of people) </li> </ul> The answer (for a Git setup) was: <ul> <li> centralization: a unique server has been set up for any repository having to be accessible by all users. Backup has been taking care of (incremental every day, full every week). DRP (Disaster Recovery Plan) has been implemented, with a second server on another site, and with real-time data replication through SRDF. This setup in itself is independent of the type of referential or tool you need (DVCS, or Nexus repo, or main Hudson scheduler, or...): any tool which can be critical for a release into production needs to be installed on servers with backup and DR.</li> </ul> . <ul> <li> authentication: only two protocols allow users to access the main repos: <ul> <li>ssh based, with public/private key: <ul> <li>useful for users external to the organization (like off-shore development),</li> <li>and useful for generic accounts that Active Directory manager don't want to create (because it would be an "anonymous" account): a real person has to be responsible for that generic account, and that would be the one owning the private key</li> </ul> </li> <li>https-based, with an Apache authenticating the users through a LDAP setting: that way, an actual login must be provided for any git operation on those repos. Git offers it with its smart http protocol, allowing not just <code>pull</code> (read) through http, but also <code>push</code> (write) through http.</li> </ul> </li> </ul> The authentication part is also reinforced at the Git level by a <code>post-receive</code> hook which makes sure that at least one of the commits you are pushing to a repo has a "committer name" equals to the user name detected through shh or http protocol. In other words, you need to set up your <code>git config user.name</code> correctly, or any push you want to make to a central repo will be rejected. . <ul> <li> authorization: both previous settings (ssh or https) are wired to call the same set of perl script, named gitolite, with as parameters: <ul> <li>the actual username detected by those two protocols</li> <li>the git command (clone, push or pull) that user wants to do</li> </ul> </li> </ul> The gitolite perl script will parse a simple text file where the authorizations (read/write access for a all repository, or for branches within a given repository, or even for directories within a repository) have been set. If the access level required by the git command doesn't match the ACL defined in that file, the command is rejected. <hr> The above describes what I needed to implement for a Git setting, but more importantly, it lists the main issues that need to be addressed for a DVCS setting to make sense in a big corporation with a unique user base. Then, and only then, a DVCS (Git, Mercurial, ...) can add values because of: <ul> <li> data exchange between multiple sites: while those users are all authenticated through the same Active Directory, they can be located across the world (the companies I have worked for have developments usually between teams across two or three countries). A DVCS is naturally made for exchanging efficiently data between those distributed teams. </li> <li> replication across environments: a setting taking care of authentication/authorization allows for cloning those repositories on other dedicated servers (for integration testing, UAT testing, pre-production, and pre-deployment purposes) </li> <li> process automation: the ease with which you can clone a repo can also be used locally on one user's workstation, for unit-testing purposes with the "guarded commits" techniques and other clever uses: see "What is the cleverest use of source repository that you have ever seen?". In short, you can push to a second local repo in charge of: <ul> <li>various tasks (unit test or static analysis of the code)</li> <li>pushing back to the main repo if those tasks are successful</li> <li> while you are still working in the first repo without having to wait for the result of those tasks.</li> </ul> </li> </ul> . <ul> <li> killer features: Any DVCS comes with those, the main one being merging (ever tried to do a complex merge workflow with SVN? Or sloooowly merge 6000 files with ClearCase?). That alone (merging) means you can really take advantage of branching, while being able at all time to merge back your code to another "main" line of development because you would do so: <ul> <li>first locally within your own repo, without disturbing anybody</li> <li>then on the remote server, pushing the result of that merge on the central repo.</li> </ul> </li> </ul>

Distributed Version Control Systems and the Enterprise - a Good mix? [closed]

1 Answers

I have just introduced a DVCS (Git in this case) in a large banking company, where Perforce, SVN or ClearCase was the centralized VCS of choices:
I already knew of the challenges (see my previous answer "Can we finally move to DVCS in Corporate Software? Is SVN still a 'must have' for development?")

I have been challenged on three fronts:

centralization: while the decentralized model has its merits (and allows for private commits or working without the network while having access to the full history), there still needs to be a clear set of centralized repos, acting as the main reference for all developers.
authentication: a DVCS allows you to "sign-off" (commit) your code as... pretty much anyone (author "foo", email "[email protected]").
You can do a git config user.name foo, or git config user.name whateverNameIFeelToHave, and have all your commits with bogus names in it.
That doesn't mix well with the unique centralized "Active Directory" user referential used by big enterprises.
authorization: by default, you can clone, push from or pull to any repository, and modify any branch, or any directory.
For sensitive projects, that can be a blocking issue (the banking world is usually very protective of some pricing or quants algorithms, which require strict read/write access for a very limited number of people)

The answer (for a Git setup) was:

centralization: a unique server has been set up for any repository having to be accessible by all users.
Backup has been taking care of (incremental every day, full every week).
DRP (Disaster Recovery Plan) has been implemented, with a second server on another site, and with real-time data replication through SRDF.
This setup in itself is independent of the type of referential or tool you need (DVCS, or Nexus repo, or main Hudson scheduler, or...): any tool which can be critical for a release into production needs to be installed on servers with backup and DR.

authentication: only two protocols allow users to access the main repos:
- ssh based, with public/private key:
  - useful for users external to the organization (like off-shore development),
  - and useful for generic accounts that Active Directory manager don't want to create (because it would be an "anonymous" account): a real person has to be responsible for that generic account, and that would be the one owning the private key
- https-based, with an Apache authenticating the users through a LDAP setting: that way, an actual login must be provided for any git operation on those repos.
  Git offers it with its smart http protocol, allowing not just pull (read) through http, but also push (write) through http.

The authentication part is also reinforced at the Git level by a post-receive hook which makes sure that at least one of the commits you are pushing to a repo has a "committer name" equals to the user name detected through shh or http protocol.
In other words, you need to set up your git config user.name correctly, or any push you want to make to a central repo will be rejected.

authorization: both previous settings (ssh or https) are wired to call the same set of perl script, named gitolite, with as parameters:
- the actual username detected by those two protocols
- the git command (clone, push or pull) that user wants to do

The gitolite perl script will parse a simple text file where the authorizations (read/write access for a all repository, or for branches within a given repository, or even for directories within a repository) have been set.
If the access level required by the git command doesn't match the ACL defined in that file, the command is rejected.

The above describes what I needed to implement for a Git setting, but more importantly, it lists the main issues that need to be addressed for a DVCS setting to make sense in a big corporation with a unique user base.

Then, and only then, a DVCS (Git, Mercurial, ...) can add values because of:

data exchange between multiple sites: while those users are all authenticated through the same Active Directory, they can be located across the world (the companies I have worked for have developments usually between teams across two or three countries). A DVCS is naturally made for exchanging efficiently data between those distributed teams.
replication across environments: a setting taking care of authentication/authorization allows for cloning those repositories on other dedicated servers (for integration testing, UAT testing, pre-production, and pre-deployment purposes)
process automation: the ease with which you can clone a repo can also be used locally on one user's workstation, for unit-testing purposes with the "guarded commits" techniques and other clever uses: see "What is the cleverest use of source repository that you have ever seen?".
In short, you can push to a second local repo in charge of:
- various tasks (unit test or static analysis of the code)
- pushing back to the main repo if those tasks are successful
- while you are still working in the first repo without having to wait for the result of those tasks.

killer features: Any DVCS comes with those, the main one being merging (ever tried to do a complex merge workflow with SVN? Or sloooowly merge 6000 files with ClearCase?).
That alone (merging) means you can really take advantage of branching, while being able at all time to merge back your code to another "main" line of development because you would do so:
- first locally within your own repo, without disturbing anybody
- then on the remote server, pushing the result of that merge on the central repo.

answered Oct 01 '22 06:10

VonC

Related questions
                            
                                How to back up private branches in git
                            
                                Retroactively Correct Authors with Git SVN?
                            
                                How does git handle folder permission?
                            
                                How to pull into not-the-current-branch?
                            
                                ORIG_HEAD, FETCH_HEAD, MERGE_HEAD etc
                            
                                git push >> fatal: no configured push destination
                            
                                How to update my fork to have the same branches and tags as the original repository on github?
                            
                                Which are the plumbing and porcelain commands?
                            
                                Which files generated by Autotools should I keep in version control repository?
                            
                                Can TortoiseMerge be used as a difftool with Windows Git Bash?
                            
                                What are the consequences of using receive.denyCurrentBranch in Git?
                            
                                Switching between multiple ssh keys in Git on Windows
                            
                                Is there a way to list the commit's author in `git rebase -i` (interactive)?
                            
                                git push not send changes to remote git repository
                            
                                Visual Studio Code push automatically
                            
                                git-gui command crashes on macOS Sierra
                            
                                How git works when two peers push changes to same remote simultaneously
                            
                                How to display metadata about single commit in git?
                            
                                How to extract one file with commit history from a git repo with index-filter & co
                            
                                Rails which files to ignore for GIT

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Distributed Version Control Systems and the Enterprise - a Good mix? [closed]

Tags:

git

version-control

dvcs

mercurial

tfs

Raj Rao

People also ask

1 Answers

VonC

Recent Activity

Donate For Us