Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using git/mercurial on projects with continuous refactoring?

Tags:

I am trying to understand if I really have any case for using git/mercurial.

The projects I work are java and c# projects, usually with 5-20 people working towards a common goal (the "release"). Most of the developers are professional developers who refactor code all of the time. So where the typical linux kernel has a large number of relatively independent changes in separate files, we have a constant flux of refactoring changes - often hitting a lot of files and a lot of code. No-one's scared of changing code here.

Now with subversion we solve this by staying extremely close to SVN HEAD. Some of us even have automated svn up's that trigger on the build server's jabber broadcast. Most of us have also learned (or learn really quickly) how to plan our work to stay close to SVN HEAD. If you're doing a major refactoring we incrementally bend the source tree in a new direction instead of going away for too long. Sometimes you just plan the refactoring operation and start off in the less contended areas. After some years of working this way it becomes second nature. Most of us simply never leave the "comfort zone" of being less than 2 hours away from svn head. The automated build and svn head is the project "pulse" and we like it.

Of course we branch off each release, but the number of backmerges from the release-branch back to trunk dwindle down quickly enough to be insignificant (we've got decent test coverage). Running off for days/weeks with private branches of the source sounds like something we actively want to discourage, and it simply doesn't happen very frequently.

Both git and mercurial sound way cool, git slightly more so since I'm more a McGyver type than a James Bond type. But when it comes to building a case for actually switching, it feels like Linus and I are living on two different planets. Most of the time we want our teams to stay focused on HEAD.

How can GIT make my version control better ? How would GIT let me improve my process ? Am I a subversion dinosaur ?

like image 803
krosenvold Avatar asked Dec 10 '08 09:12

krosenvold


2 Answers

In terms of mentality, there is sometimes a benefit in being able to create a soft-branch, perform all the changes in the soft branch, test the result of the changes in the soft branch, and then when the soft-branch is "complete", reintegrate it with the main branch locally, retest it, and then propagate it.

This in some cases is better than the rushed head, because you don't have the constant interruption of the existence of others debug code fouling up your code to add non-errors to handle.

It also means you can commit more often, giving a more comprehensive history, because commits don't instantly turn up everywhere creating problems.

Also, when reconciling the soft-branch with the shared mainline, you get to see a nice complete change-set, showing you both all your collective changes, and all their collective changes, which opens the door for nice code review opportunities.

Additionally, from a testing perspective, if you have more soft-branches, you can run the tests on the soft branches before merging them back into the main branch, and have a standard by which, a branch does not get merged back into the main branch until it has

  1. Passed Tests on its own
  2. Passed Tests after main branches changes have been reconciled into the soft-branch

Thus giving you an extra guarantee of code quality, in that your main collaboration branch is extra squeaky clean, because failing code is not permitted to appear on it.

( Which also limits the problem solving domain, because you only have to test your own changes for the most part, and when you are 'done', only then do you have to worry about what everyone else has done, and what they have done should also be passing tests, which means when something does fail, you only have to look at what you did to solve the problem )

But would I continiously update from central repo head into my softbranch ? This is really the essence of my problem

The beauty of the branch system is you can pull "whatever's been deemed stable by others" into your local copy as needed.

"Continuous Update" becomes unnecessary, because you don't have the same problems manifesting.

a  b   center
         |
         |
         /      Key:   / - | \   = Flow
/----<---|             < >       = Flow Directions
|  /--<--/             *         = Commit 
*  |     |             T         = Test
|  *     |             M         = Merging with "Current" state of common branch
*  |     |             C         = Tests Complete and Current state is "sane"
|  *     |
T  |     |
|  T     |
|  C     |
|  |/-<--/
*  M     |
|  T     |
*  C     |
|  \--->-\
*  /---<-/
T  |     |
C  *     |
|/-----<-/
M  |     |
T  *     |
|  T     |
*  C     |
|  |/-<--/
*  M     |
T  T     |
C  \-->--\
|/---<---/
M        |
T        |
C        |
\---->---\
         |

Also, because of how the system works, later on, this could also occur:

a  b   center
|  |     |
T  |     |
C  *     |
|/</     |
M  |     |
|  *     |
T        |
C        |
|/----<--/
M        |
T        |
C        |
\-->-----\
         |

The entire concept of their being a "head" in such a scenario vanishes. It there are dozens of heads, which one you see is prone to perspective.

I might also add, that these logical branches, although displayed as separate here, can quite feasibly represent either seperate checkout locations, or mere different soft branches on a singular machine. a & b could in fact be a single developer.

In essence, "Continuously updating my softbranch from mainbranch", is conceptually meaningless. Because in fact, there will be changes not represented in mainbranch yet, and when will you know that they have been pushed or not? SVN Gives you this false illusion of a "singular" code state, when in reality, the instant one user opens a file in their text editor, they have in fact created a very short life soft-branch, a change, that is occurring,that nobody knows about, and for this illusion to be sustained the way you think it works, in effect, the user has to commit after every character, which is hardly practical. So in reality, people get used to the fact that different locations get "out-of-sync" with each other, and learn ways to solve it so it's no longer a problem.

Also, the "constantly updating my tree with everyone elses changes" has a core problem, in that, you have far too many distractions, you are constantly being bombarded with everything everyone else is doing, and if they're making a series of 1 line commits to test something they cant test on their own machine, then you have a nightmare with the ever changing file, and the users seeing the seemingly random changes cant make sense of them.

By permitting longer runs between commits, and then seeing the net result in batches and seeing only the net result of your peers changes all at once, you can see immediately what code has been changed since you checked out and a cohesive overview of what it means for your code, so you can just write your own code and get it over with.

If you have any doubts

Start off with something simple and don't transition cold turkey, some of the concepts in DSCMs can be a bit daunting ( I've seen tragic failure by many to understand the concept of vertically stacked soft-branches ) , move a small non-essential part of the codebase to Git/Mercurial and play with it for a while, experiment with the benefits and what it can do. There's no better proof than experiencing it yourself, and all my lovely explanations are unlikely to communicate what you need to understand and can only be learned by trying it, and failing a few times ( because failure is a key part of learning )

like image 59
Kent Fredric Avatar answered Oct 21 '22 19:10

Kent Fredric


The way your team uses Subversion means that there's a quite a lot of merge effort happening. Almost every time a team member updates to the most recent mainline they are merging their working copy with the latest mainline, even if they don't commit their own changes at this stage. The merging overhead tends towards being a product of commit rate and number of team members, and since commit rate is a function of the number of team members, your source code management overhead is O(N^2).

In the Mercurial/Git model, this merging effort will get shared out among the team. If you routinely pull changesets from everyone then often you'll find that others have already done nearly all the merge work you might have had to do. And, in the cases where a merge has broken something, often people will already have fixed it. Reconciling a pair of branches only has to be done once, and since rate of generating new branches is proportional to number of team members, the source code management overhead is O(N).

I'd expect 20 developers working close to the head on one branch is likely to involve a fair amount of merging work (dealing with conflicts, and dealing with regressions due to independent development), so I'd be surprised if you tried Mercurial/Git and did not find a useful productivity win. Do you think you could manage 100 developers with your current progress? I'd estimate the number of Linux kernel developers at 4000, and yet they get a lot of stuff done and the total source code management overhead must be acceptable.

As long as your source code management system tracks common parentage, merges are often automatic. Each merge has a chance of either being a textual conflict or breaking the build and tests due to interaction between changes; the less merges you have to do the less time you'll have to spend dealing with either kind of problem.

Consequently a really big win is that Mercurial/Git users no longer fear branches. Before BitKeeper (the tool that really introduced this way of working, as far as I know), long lasting branches were time bombs ticking away, occasionally exploding and taking a week to recover from. Now, we can branch without hesitation, with confidence that we will be able to merge later. In a healthy team, others can see your branches in progress and merge them to theirs and commit your efforts later if they think it is worthwhile. In a centralised model, if each developer had 3 or 4 branches active, and I'm correcting in saying it's O(N^2), then you've just increased your source code management overhead by a factor of 3ish^2, i.e. an order of magnitude, which is probably enough to really hurt overall health , wealth and happiness. This is the underlying reason that team members (and managers) typically fear branches in centralised systems and work to minimise them.

like image 24
Dickon Reed Avatar answered Oct 21 '22 17:10

Dickon Reed