Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Performing Historical Builds with Mercurial

Background

We use a central repository model to coordinate code submissions between all the developers on my team. Our automated nightly build system has a code submission cut-off of 3AM each morning, when it pulls the latest code from the central repo to its own local repository.

Some weeks ago, a build was performed that included Revision 1 of the repo. At that time, the build system did not in any way track the revision of the repository that was used to perform the build (it does now, thankfully).

  -+------- Build Cut-Off Time
   |
   |
   O    Revision 1

An hour before the build cut-off time, a developer branched off the repository and committed a new revision in their own local copy. They did NOT push it back to the central repo before the cut-off and so it was not included in the build. This would be Revision 2 in the graph below.

  -+------- Build Cut-Off Time
   |
   | O  Revision 2
   | |
   | |
   |/
   |
   O    Revision 1

An hour after the build, the developer pushed their changes back to the central repo.

   O    Revision 3
   |\
   | |
  -+-+----- Build Cut-Off Time
   | |
   | O  Revision 2
   | |
   | |
   |/
   |
   O    Revision 1

So, Revision 1 made it into the build, while the changes in Revision 2 would've been included in the following morning's build (as part of Revision 3). So far, so good.

Problem

Now, today, I want to reconstruct the original build. The seemingly obvious steps to do this would be to

  1. determine the revision that was in the original build,
  2. update to that revision, and
  3. perform the build.

The problem comes with Step 1. In the absence of a separately recorded repository revision, how can I definitively determine what revision of the repo was used in the original build? All revisions are on the same named branch and no tags are used.

The log command

  hg log --date "<cutoff_of_original_build" --limit 1

gives Revision 2 - not Revision 1, which was in the original build!

Now, I understand why it does this - Revision 2 is now the revision closest to the build cut-off time - but it doesn't change the fact that I've failed to identify the correct revision on which to rebuild.

Thus, if I can't use the --date option of the log command to find the correct historical version, what other means are available to determine the correct one?

like image 773
kopaka Avatar asked Jun 29 '11 22:06

kopaka


2 Answers

Considering whatever history might have been in the undo files is gone by now (the only thing I can think of that could give an indication), I think the only way to narrow it down to a specific revision will be a brute force approach.

If the range of possible revisions is a bit large and the product of building changes in size or other non-date aspect that is linear or near enough to linear, you may be able to use the bisect command to basically do a binary search to narrow down what revision you're looking for (or maybe just get close to it). At each revision that bisect stops to test, you would build at that revision and test whatever aspect you're using to compare against what the scheduled build produced that night. Might not even require building, depending on the test.

If it really is as simple as the graph you depict and the range of possibilities is short, you could just start from the latest revision it might be and walk backwards a few revisions, testing against the original build.

As for a definitive test comparing the two builds, hashing the test build and comparing it to a hash of the original build might work. If a compile on the nightly build machine and a compile on your machine of the same revision do not produce binary-identical builds, you may have to use binary diffing (such as with xdelta or bsdiff) and look for the smallest diff.


Mercurial does not have the information you want:

Mercurial does not, out of the box, make it its business to log and track every action performed regarding a repository, such as push, pull, update. If it did, it would be producing a lot of logging information. It does make available hooks that can be used to do that if one so desires.

It also does not care what you do with the contents of the working directory, such as opening files or compiling, so of course it is not going to track that at all. It's simply not what Mercurial does.

It was a mistake to not know exactly what the scheduled build was building. You agree implicitly because you now log that very information. The lack of that information before has simply come back to bite you, and there is no easy way out of it. Mercurial does not have the information you need. If the central repo is just a shared directory rather than a web-hosted repository that might have tracked activity, the only information about what was built is in the compiled version. Whether it is some metadata declared in the source that becomes part of the build, a naive aspect like filesize, or you truly are stuck hashing files, you can't get your answer without some effort.

Maybe you don't need to test every revision; there may be revisions you can be certain are not candidates. Knowing the time of the compile is merely a factor as the upper bound on the range of revisions to test. You know that revisions after that time could not possibly be candidates. What you don't know is what was pushed to the server at the time the build server pulled from it. But you do know that revisions from that day are the most likely. You also know that revisions in parallel unnamed branches are less-likely candidates than linear revisions and merges. If there are a lot of parallel unnamed branches and you know all your developers merge in a particular way, you might know whether the revisions under parent1 or parent2 should be tested based.

Maybe you don't even need to compile if there is metadata you can parse from the source code to compare with what you know about the specific build.

And you can automate your search. It would be easiest to do so with a linear search: less heuristics to design.

The bottom line is simply that Mercurial does not have a magic button to help in this case.

like image 64
Joel B Fant Avatar answered Oct 11 '22 15:10

Joel B Fant


Apologies, it's probably bad form to answer one's own question, but there wasn't enough room to properly respond in a comment box.


To Joel, a couple of things:

First - and I mean this sincerely - thanks for your response. You provided an option that was considered, but which was ultimately rejected because it would be too complex to apply to my build environment.

Second, you got a little preachy there. In the question, it was understood that because a separately recorded repository revision was absent, there would be 'some effort' to figure out the correct revision. In a response to Lance's comment (above), I agree that recording the 40-byte repository hash is the 'correct' way of archiving the necessary build info. However, this question was about what CAN be done IF you do not have that information.

To be clear, I posted my question on StackOverflow for two reasons:

  1. I figured that others must have run into this situation before and that, perhaps, someone may have determined a means to get at the requisite information. So, it was worth a shot.
  2. Information sharing. If others run into this problem in the future, they will have an online reference that clearly explained the problem and discussed viable options for remediation.

Solution

In the end, perhaps my greatest thanks should go to Chris Morgan, who got me thinking to use the central server's mercurial-server logs. Using those logs, and some scripting, I was able to definitively determine the set of revisions that were pushed to the central repository at the time of the build. So, my thanks to Chris and to everyone else who responded.

like image 39
kopaka Avatar answered Oct 11 '22 15:10

kopaka