Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Git-Based Source Control in the Enterprise: Suggested Tools and Practices?

Tags:

git

enterprise

Against the common opinion, I think that using a DVCS is an ideal choice in an enterprise setting because it enables very flexible workflows. I will talk about using a DVCS vs. CVCS first, best-practices and then about git in particular.

DVCS vs. CVCS in an enterprise context:

I wont talk about the general pros/cons here, but rather focus on your context. It is the common conception, that using a DVCS requires a more disciplined team than using a centralized system. This is because a centralized system provides you with an easy way to enforce your workflow, using a decentralized system requires more communication and discipline to stick to the established of conventions. While this may seem like it induces overhead, I see benefit in the increased communication necessary to make it a good process. Your team will need to communicate about code, about changes and about project status in general.

Another dimension in the context of discipline is encouraging branching and experiments. Here's a quote from Martin Fowler's recent bliki entry on Version Control Tools, he has found a very concise description for this phenomenon.

DVCS encourages quick branching for experimentation. You can do branches in Subversion, but the fact that they are visible to all discourages people from opening up a branch for experimental work. Similarly a DVCS encourages check-pointing of work: committing incomplete changes, that may not even compile or pass tests, to your local repository. Again you could do this on a developer branch in Subversion, but the fact that such branches are in the shared space makes people less likely to do so.

DVCS enables flexible workflows because they provide changeset tracking via globally unique identifiers in a directed acyclic graph (DAG) instead of simple textual diffs. This allows them to transparently track the origin and history of a changeset, which can be quite important.

Workflows:

Larry Osterman (a Microsoft dev working on the Windows team) has a great blog post about the workflow they employ at the Windows team. Most notably they have:

  • A clean, high quality code only trunk (master repo)
  • All development happens on feature branches
  • Feature teams have team repos
  • They do regularily merge the latest trunk changes into their feature branch (Forward Integrate)
  • Complete features must pass several quality gates e.g. review, test coverage, Q&A (repos on their own)
  • If a feature is completed and has acceptable quality it is merged into the trunk (Reverse Integrate)

As you can see, having each of these repositories live on their own you can decouple different teams advancing at different paces. Also the possibility to implement a flexible quality gate system distinguishes DVCS from a CVCS. You can solve your permission issues at this level too. Only a handful of people should be allowed access to the master repo. For each level of the hierachy, have a seperate repo with the corresponding access policies. Indeed, this approach can be very flexible on the team level. You should leave it up to each team to decide wether they want to share their team repo among themselves or if they want a more hierachical approach where only the team lead may commit to the team repo.

Hierachical Repositories

(The picture is stolen from Joel Spolsky's hginit.com.)

One thing remains to be said at this point though:- even though DVCS provides great merging capabilities, this is never a replacement for using Continuous Integration. Even at that point you have a great deal of flexibility: CI for the trunk repo, CI for team repos, Q&A repos etc.

Git in an enterprise context:

Git is maybe not the ideal solution for an enterprise context as you have already pointed out. Repeating some of your concerns, I think most notably they are:

  • Still somewhat immature support on Windows (please correct me if that changed recently) Now windows has github windows client , tortoisegit , SourceTree from atlassian
  • Lack of mature GUI tools, no first class citizen vdiff/merge tool integration
  • Inconsistent interface with a very low level of abstractions on top of its inner workings
  • A very steep learning curve for svn users
  • Git is very powerful and makes it easy to modify history, very dangerous if you don't know what you are doing (and you will sometimes even if you thought you knew)
  • No commercial support options available

I don't want to start a git vs. hg flamewar here, you have already done the right step by switching to a DVCS. Mercurial addresses some of the points above and I think it is therefore better suited in an enterprise context:

  • All plattforms that run python are supported
  • Great GUI tools on all major plattforms (win/linux/OS X), first class merge/vdiff tool integration
  • Very consistent interface, easy transition for svn users
  • Can do most of the things git can do too, but provides a cleaner abstraction. Dangerous operations are are always explicit. Advanced features are provided via extensions that must explicitly be enabled.
  • Commercial support is available from selenic.

In short, when using DVCS in an enterprise I think it's important to choose a tool that introduces the least friction. For the transition to be successful it's especially important to consider the varying skill between developers (in regards to VCS).


Reducing friction:

Ok, since you appear to be really stuck with the situation, there are two options left IMHO. There is no tool to make git less complicated; git is complicated. Either you confront this or work around git:-

  1. Get a git introductory course for the whole team. This should include the basics only and some exercises (important!).
  2. Convert the master repo to svn and let the "young-stars" git-svn. This gives most of the developers an easy to use interface and may compensate for the lacking discipline in your team, while the young-stars can continue to use git for their own repos.

To be honest, I think you really have a people problem rather than a tool problem. What can be done to improve upon this situation?

  • You should make it clear that you think your current process will end up with a maintainable codebase.
  • Invest some time into Continous Integration. As I outlined above, regardless which kind of VCS you use, there's never a replacement for CI. You stated that there are people who push crap into the master repo: Have them fix their crap while a red alert goes off and blames them for breaking the build (or not meeting a quality metric or whatever).

I'm the SCM engineer for a reasonably large development organization, and we converted to git from svn over the last year or so. We use it in a centralized fashion.

We use gitosis to host the repositories. We broke our monolithic svn repositories up into many smaller git repositories as git's branching unit is basically the repository. (There are ways around that, but they're awkward.) If you want per-branch kinds of access controls, gitolite might be a better approach. There's also an inside-the-firewall version of GitHub if you care to spend the money. For our purposes, gitosis is fine because we have pretty open permissions on our repositories. (We have groups of people who have write access to groups of repositories, and everyone has read access to all repositories.) We use gitweb for a web interface.

As for some of your specific concerns:

  • merges: You can use a visual merge tool of your choice; there are instructions in various places on how to set it up. The fact that you can do the merge and check its validity totally on your local repo is, in my opinion, a major plus for git; you can verify the merge before you push anything.
  • GUIs: We have a few people using TortoiseGit but I don't really recommend it; it seems to interact in odd ways with the command line. I have to agree that this is an area that needs improvement. (That said, I am not a fan of GUIs for version control in general.)
  • small-group tracking branches: If you use something that provides finer-grained ACLs like gitolite, it's easy enough to do this, but you can also create a shared branch by connecting various developers' local repositories — a git repo can have multiple remotes.

We switched to git because we have lots of remote developers, and because we had many issues with Subversion. We're still experimenting with workflows, but at the moment we basically use it the same way as we used to use Subversion. Another thing we liked about it was that it opened up other possible workflows, like the use of staging repositories for code review and sharing of code among small groups. It's also encouraged a lot of people to start tracking their personal scripts and so forth because it's so easy to create a repository.


Yes, I know, Linus never intended it for that.

Actually, Linus argues that centralized systems just can't work.

And, what's wrong with the dictator-and-lieutenants workflow?

diagram

Remember, git is a distributed system; don't try to use it like a central one.

(updated)

Most of your problems will go away if you don't try to use git as if it was "svn on steroids" (because it's not).

Instead of using a bare repository as a central server where everyone can push to (and potentially screw up), setup a few integration managers that handle merges, so that only they can push to the bare repository.

Usually these people should be the team leads: each leader integrates his own team's work and pushes it to the blessed repository.

Even better, someone else (i.e. dictator) pulls from the team leaders and integrates their changes into the blessed repository.

There's nothing wrong with that workflow, but we're an overworked startup and need our tools to substitute for human time and attention; nobody has bandwidth to even do code reviews, let alone be benevolent dictator.

If the integrators don't have time to review code, that's fine, but you still need to have people that integrate the merges from everybody.

Doing git pulls doesn't take all that much time.

git pull A
git pull B
git pull C

git does substitute for human time and attention; that's why it was written in the first place.

  • The GUI tools aren't mature

The gui tools can handle the basic stuff pretty well.

Advanced operations require a coder/nerdy mindset (e.g. I'm comfortable working from the command line). It takes a bit of time to grasp the concepts, but it's not that hard.

  • Using the command line tools, it's far to easy to screw up a merge and obliterate someone else's changes

This won't be a problem unless you have many incompetent developers with full write access to the "central repository".

But, if you set up your workflow so that only a few people (integrators) write to the "blessed" repository, that won't be a problem.

Git doesn't make it easy to screw up merges.

When there are merge conflicts, git will clearly mark the conflicting lines so you know which changes are yours and which are not.

It's also easy to obliterate other people's code with svn or any other (non-dsitributed) tool. In fact, it's way easier with these other tools because you tend to "sit on changes" for a long time and at some point the merges can get horribly difficult.

And because these tools don't know how to merge, you end up always having to merge things manually. For example, as soon as someone makes a commit to a file you're editing locally, it will be marked as a conflict that needs to be manually resolved; now that is a maintenance nightmare.

With git, most of the time there won't be any merge conflicts because git can actually merge. In the case where a conflict does occur, git will clearly mark the lines for you so you know exactly which changes are yours and which changes are from other people.

If someone obliterates other people's changes while resolving a merge conflict, it won't be by mistake: it will either be because it was necessary for the conflict resolution, or because they don't know what they're doing.

  • It doesn't offer per-user repository permissions beyond global read-only or read-write privileges

  • If you have a permission to ANY part of a repository, you can do that same thing to EVERY part of the repository, so you can't do something like make a small-group tracking branch on the central server that other people can't mess with.

  • Workflows other than "anything goes" or "benevolent dictator" are hard to encourage, let alone enforce

These problems will go away when you stop trying to use git as if it was a centralized system.

  • It's not clear whether it's better to use a single big repository (which lets everybody mess with everything) or lots of per-component repositories (which make for headaches trying to synchronize versions).

Judgment call.

What kind of projects do you have?

For example: does version x.y of project A depend on exactly version w.z of project B such that every time you check x.y of project A you also have to checkout w.z of project B, otherwise it won't build? If so I'd put both project A and project B in the same repository, since they're obviously two parts of a single project.

The best practice here is to use your brain

  • With multiple repositories, it's also not clear how to replicate all the sources someone else has by pulling from the central repository, or to do something like get everything as of 4:30 yesterday afternoon.

I'm not sure what you mean.


I highly recommend http://code.google.com/p/gerrit/ for enterprise work. It gives you access control plus a built-in review based workflow. It authenticates against any LDAP system. You can hook it up to Hudson with http://wiki.hudson-ci.org/display/HUDSON/Gerrit+Plugin, letting you build and test changes while they're still under review; it's a really impressive setup.

If you decide to use gerrit, I recommend trying to keep a pretty linear history, not a branchy history like some of the open source guys like. Gerrit phrases this as "allow fast-forward changes only." Then you can use branching and merging in more the the way you're used to, for releases and whatnot.