Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Policy for fixing broken nightly builds [closed]

I guess everybody agrees that having continuous builds and continuous integration is beneficial for quality of the software product. Defects are found early so they can be fixed ASAP. For continuous builds, which take several minutes, it is usually easy to find the one who caused the defect. However, for nightly integration tests, which take long time to run, this may be a challenge. Here are specifics of the situation, for which I'm looking for an optimal solution:

  • Running integration tests takes more than 1 hour. Therefore they are run overnight. Multiple check-ins happen every day (team of about 15 developers) so it is sometimes difficult to find the "culprit" (if any).
  • Integration testing environment depends on other environments (web services and databases), which may fail from time to time. This causes integration tests to fail.

So how to organize the team so that these failures are fixed early? In my opinion, there should be someone appointed to DIAGNOSE the defect(s). This should be the first task in the morning. If he needs an expertise of others, they should be readily available. Once the source (component, database, web service) of the failure is determined, the owner should start fixing it (or another team should be notified).

How to appoint the one who diagnoses the defects? Ideally, someone would volunteer (ha ha). This won't happen very often, I'm afraid. I've heard other option - whoever comes first to the office should check the results of the nightly builds. This is OK, if the whole team agrees. However, this rewards those who come late. I suppose that this role should rotate in the team. The excuse "I don't know much about builds" should not be accepted. Diagnostics of the source of the failure should be rather straightforward. If it is not, then adding more of diagnostics logging to the code should improve the visibility into integration test failures.

Any experience in this area or suggestions for improvements of the above approach?

like image 972
lumi77 Avatar asked Dec 22 '09 15:12

lumi77


2 Answers

A famous policy about broken nightly builds, attributed to Microsoft, is that the guy whose commit broke the build becomes responsible for maintaining nightly builds until someone else breaks it.

That makes sense, since

  • everyone makes mistakes, so the necessary rotation will occur (empowered with Least-Recently-Used choice pattern for ambiguous cases)
  • it encourages people to write better code
like image 75
P Shved Avatar answered Sep 28 '22 20:09

P Shved


What I generally do (I've done it for a team of between 8 and 10 persons) is two have one guy that checks the build, as the first thing he does in the morning -- some would say he is responsible for QA, I suppose.

If there is a problem, he's responsible for finding out what/how -- of course, he can ask help from the other members of the team, if needed.

This means there's at least one member of the team that has to have a great knowledge of the whole application -- but that's not a bad thing anyway : it'll help diagnose problems the day that application is used in production and suffers a failure.

And instead of having one guy to do that, I like when there are two : one for one week, the other for the second week -- for instance ; this way, there are greater chances of always having someone who can diagnose problems, even if one of them is in holidays.


As a sidenote : the more useful things you log during the build, the easier it is to find out what went wrong -- and why.


Why not let everyone in the team check the build every morning ?

  • Well, not every one wants to, first of all -- and that will be done better if the one doing it likes what he does
  • And you don't want 10 people spending half an hour every day on that ^^
like image 31
Pascal MARTIN Avatar answered Sep 28 '22 20:09

Pascal MARTIN