Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

hg to git conversion and subrepo merge

Despite involving two subparts, I'm asking this as a combined question because the way it's broken down into parts isn't what's important. I'm open to different ways to achieve what I want as long as the end result retains all the meaningful history and ability to check out, study, and build/test historical versions. The goal is to retire hg and the subrepo model that's been used so far and move to a unified tree in git, but without sacrificing history.

What I'm starting with is a Mercurial repository that consists of some top-level code and a number of subrepositories where the bulk of interesting history lies. The subrepos have some branching/merges, but nothing too crazy. The final result I want to achieve is a single git repository, with no submodules, such that:

  • For each commit in the original top-level hg repo, there is a git commit that checks out exactly the same tree as you'd get checking out the corresponding hg commit with all its references subrepo commits.

  • These git commits corresponding to successive top-level hg commits are descendants of each other, with commits corresponding to all relevant subrepo commits in between.

The basic idea I have for how to achieve this is to iterate over all top-level hg commits, and for each top-level commit that changes .hgsubstate, also iterate over all paths from the old revision to the new revision for the submodule (possibly involving branching). At each step:

  • Check out the appropriate hg revisions for top-level and all subrepos.
  • Delete everything from the git index.
  • Stage everything checked out from hg to the git index.
  • Use git-write-tree and git-commit-tree to generate a commit with the desired parents, using authorship, date, and commit message from the corresponding hg commit.
  • Record the correspondence between the new git commit and hg commits for use in generating future commits' parents.

Should this work? Is there a better way to achieve what I want, perhaps doing the subrepo collapse with hg first? The biggest thing I'm not clear on is how to perform the desired iteration, so practical advice for how to achieve it would be great.

One additional constraint: the original repos involve content which can't be published (this an additional git-filter-branch step once the basic conversion is done) so solutions that involve uploading the repo for processing by a third party are not viable.

like image 395
R.. GitHub STOP HELPING ICE Avatar asked May 10 '16 17:05

R.. GitHub STOP HELPING ICE


2 Answers

It seems what I was missing from my question and discussion of possible solutions was a proper understanding of the graph theory involved. Ideas like "iterate over all paths from the old revision to the new revision" were not really well-defined, or at least didn't reflect what I expected them to reflect. Coming at it from a more rigorous standpoint, I think I have an approach that works.

To begin with, the problem: Subrepo revisions only represent the state of their own subtrees at a given point in history. I want to map them to revisions that represent the state of the whole combined tree. Then the subrepo DAGs can be merged with the top-level DAG in a meaningful way.

For a given subrepo revision R, we can ask what top-level-repo (or parent-repo, if we had multiple levels of subrepos) revisions include R or any descendant of R. Assuming a single root, this set of revisions has a Lowest Common Ancestor (or maybe more than one), which seems like a good candidate. Indeed, if the top-level revision S we use with R is not a common ancestor of revisions which use R or its descendants (but the mapping is otherwise reasonable), then R will have a descendant R' whose associated top-level revision S' is not a descendant of S. In other words, the history derived from the subrepo will have confusing/nonsensical jumps between revisions of the top-level tree.

Now, if we want to choose a common ancestor, the lowest one makes sense from a standpoint of making these revisions something that can be checked-out, built, and tested, and from a standpoint of giving a reasonable idea what the state of the top-level repo (and other subrepos) was at the time the changes in the subrepo were made. The root of the whole top-level DAG would of course also work, but it would not give meaningful, usable revisions that could be checked out; choosing the root would be equivalent (from a usability standpoint) to a naive repo-merge that has one root per subrepo and just merges from the subrepo histories whenever the top-level repo updates the revisions it's using.

So, if we can use the LCA to assign a top-level revision T(R) to each subrepo revision R, how does that translate into

Whenever a subrepo revision R has T(R) distinct from T(P) for each parent P of R, it's effectively merging new changes from the top-level repo (and other subrepos) into the subrepo history. The conversion should represent this as two commits:

  1. The actual subrepo commit R, using an old top-level revision. If R has a single parent P (not a merge commit), this will be T(P). If R had multiple parents, it's not clear whether there's a perfect choice of which one to use, but T(P) for any parent P should be reasonable.

  2. A merge commit merging back the conversion C(T(R)) of the top-level-repo commit T(R) associated with R, where C(T(R)) itself just merged (1) above.

Aside from C(T(R)), which references (1) as a merge parent, all other references to R in the conversion should use (2). This includes the conversions of any descendants of T(R) in the top-level repo which use revision R of this subrepo, and the conversions of direct children of R itself.

I believe the above (albeit poorly worded) description specifies all that's needed for merging the top-level and subrepo DAGs. Each subrepo revision gets a full version of the tree, and ends up connected into a unified DAG for the converted repo via "merge commits" (when the subrepo merges a new associated top-level revision, and when the top-level merges subrepo revisions that have changed).

The final step of producing the git repo, then, is simply replaying the merged DAG, either in topologically sorted form or via a depth-first walk, such that each git commit-tree already has all the parent revisions it needs present.

like image 170
R.. GitHub STOP HELPING ICE Avatar answered Sep 27 '22 18:09

R.. GitHub STOP HELPING ICE


What you have written might or might not solve the issue. But it isn't simple. Main issue is that you need commit in order so that your subrepos and main repo are consistent. I recreated this problem in a small scale and was able to have consistency between subrepos also).

My solution:

  1. Using hg convert extension, I converted main repo to a repo without subrepos (and related information).

    cd main
    awk '{ print  $1}'  .hgsub | xargs -n 1 echo 'exclude'  > ../filemap
    echo exclude .hgsub >> ../filemap
    echo exclude .hgsubstate >> ../filemap
    cd ..
    hg convert --filemap filemap  main mainConv
    cd mainConv
    hg update
    
  2. Convert subrepo by using rename in --filemap.

    cd ..
    echo rename . subRepo > subFileMap
    hg convert --filemap main/subRepo subRepoConv
    cd subRepoConv
    hg update
    
  3. Pull subrepos to converted main repo.

    cd ../mainConv
    hg pull -f ../subRepoConv
    
  4. You will notice multiple heads in the repo while pulling (because subrepo have their own head). Merge them:

     hg heads
     hg merge <RevID from subrepo (not main repo)>
     hg ci -mMergeOfSubRepo
    

You have to repeat 3 & 4 for every subrepo.

  1. But commits won't be sorted. So put them in order as done here https://stackoverflow.com/a/16012597:

     cd .. 
     hg clone -r 0 mainConv mainOrdered
     cd mainOrdered
     for REV in `hg log -R ../main -r 'sort(1:tip, date)' --template '{rev}\n'`
     do 
              hg pull ../main -r $REV
     done
    

Now convert this ordered mercurial repo to git using http://repo.or.cz/w/fast-export.git:

cd ..
git clone git://repo.or.cz/fast-export.git
git init mainGit
cd mainGit
../fast-export/hg-fast-export.sh -r ../mainOrdered
git checkout HEAD
like image 27
khrm Avatar answered Sep 27 '22 18:09

khrm