Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

git equivalent to hg rebase -s source -d destination?

Tags:

git

rebase

Is there a git equivalent to hg rebase -s source -d newparent?

That is, 'prune' a branch at source and 'graft' it at newparent. Or reparent source on newparent (merging where appropriate).

Or how to go for example from this:

A - B - C
 \
  D - E
   \
    F

To this:

A - B - C
     \
      D'- E'
       \
        F'

In this case, source is D, and newparent is B. Doing hg rebase -s D -d B produces the desired result. Is there a git equivalent?

I've tried git rebase --onto B D but it didn't apparently do anything apart from moving branch labels around.

Edited for clarification: The goal is not to reparent a commit in a tree exactly like the above. The above is an example. The goal is to let me reparent a commit on top of any other commit, as long as there are no weird situations like trying to reparent a merge commit or similar. I've created a couple of scripts that recreate the tree above, one for hg:

#!/bin/sh
set -e
rm -rf .hg
hg init
cat > .hg/hgrc <<'EOF'
[ui]
username = Rebase tester <no@email>
[extensions]
rebase =
EOF
echo A > file.txt
hg add file.txt
hg commit -m A_msg
hg bookmark A_bm
hg bookmark dummy # to stop A from tracking us
echo B > file.txt
hg commit -m B_msg
hg bookmark B_bm
hg bookmark graftpoint
hg bookmark -f dummy
echo C > file.txt
hg commit -m C_msg
hg bookmark C_bm
hg checkout A_bm
hg bookmark -f dummy
echo D > file.txt
hg commit -m D_msg
hg bookmark D_bm
hg bookmark prunepoint
hg bookmark -f dummy
echo E > file.txt
hg commit -m E_msg
hg bookmark E_bm
hg checkout D_bm
hg bookmark -f dummy
echo F > file.txt
hg commit -m F_msg
hg bookmark F_bm
hg bookmark -d dummy
hg log -G -T '{desc} {bookmarks} {rev}:{node|short}'
hg rebase -s D_bm -d B_bm -t internal:other
hg log -G -T '{desc} {bookmarks} {rev}:{node|short}'

and one for git:

#!/bin/sh
set -e
rm -rf .git
git init
git config user.name 'Rebase tester'
git config user.email 'no@email'
echo A > file.txt
git add file.txt
git commit -m A_msg
git branch A_br
echo B > file.txt
git commit -a -m B_msg
git branch B_br
git branch graftpoint
echo C > file.txt
git commit -a -m C_msg
git branch C_br
git checkout A_br
git checkout -b D_br
echo D > file.txt
git commit -a -m D_msg
git branch prunepoint
git checkout -b E_br
echo E > file.txt
git commit -a -m E_msg
git checkout D_br
git checkout -b F_br
echo F > file.txt
git commit -a -m F_msg
git log --graph --all --format=format:'%s %d %h%n'

#insert command(s) here

git log --graph --all --format=format:'%s %d %h%n'

The trees output by the first script are:

@  F_msg F_bm 5:5ffe9c283d51
|
| o  E_msg E_bm 4:9f83c609d7b2
|/
o  D_msg D_bm prunepoint 3:c3561e22f394
|
| o  C_msg C_bm 2:e7dd832a739b
| |
| o  B_msg B_bm graftpoint 1:c3d6803dba3e
|/
o  A_msg A_bm 0:f52f4706cef0

and

@  F_msg F_bm 5:efe4fde4dcdf
|
| o  E_msg E_bm 4:b2402cb25f70
|/
o  D_msg D_bm prunepoint 3:5849595efdde
|
| o  C_msg C_bm 2:e7dd832a739b
|/
o  B_msg B_bm graftpoint 1:c3d6803dba3e
|
o  A_msg A_bm 0:f52f4706cef0

exactly as expected. The output of the second script is

* C_msg  (master, C_br) de97063
|  
* B_msg  (B_br) 6053c6b
|    
| * E_msg  (E_br) 13d4fac
| |     
| | * F_msg  (HEAD, F_br) b9ce3c4
| |/  
| |   
| * D_msg  (D_br) ed2ba19
|/  
|  
* A_msg  (A_br) 2cf9476

(twice). The second one should be something like:

* C_msg  (master, C_br) de97063
|    
| * E_msg  (E_br) 1398dc5
| |     
| | * F_msg  (HEAD, F_br) 8ee34ad
| |/  
| |   
| * D_msg  (D_br) ed873f7
|/  
|  
* B_msg  (B_br) 6053c6b
|  
* A_msg  (A_br) 2cf9476

My problem is that hg rebase -s source -d destination works in any situation, but I haven't found a way to do the same with git. I've found a couple third party programs, but they don't seem to address this use case. One is git reparent and the other is git-reparent-branch. I've also found a solution using grafts and filter-branch but it's not apparent to me that that would correctly handle conflicts.

like image 898
Pedro Gimeno Avatar asked Oct 29 '16 22:10

Pedro Gimeno


1 Answers

This can be done in Git, but it's more complicated. To understand why, and therefore get to how, we need to review a key difference between Mercurial and Git.

[Edit, a day or two later: I hate to make this longer, but I think I can summarize the problem in two key points now. It boils down to:

  1. Mercurial allows multiple heads—Mercurial's notion of a tip of a branch is called a head—within a branch. When this situation occurs, Mercurial just deals with it, because it can and must.

  2. Git's design makes it impossible, by definition, to have multiple tip commits—Git's notion of the tip of a branch is called a tip—on the same branch. This means Git can't have the equivalent, does not have to deal with it, and simply doesn't try. But we can do what we might want, using Git's built-in tools; it just gets messy.

The remainder is the detailed explanation, along with a way—one that gets quite clumsy—to do the job with existing Git tools. What's really needed to do what hg rebase does just with one command, is a better Git tool, but as far as I know it does not exist. I wanted it for a while, and started writing it, but then the use case itself went away and I left it as a prototype that did only what I needed at the time.]

Branches vs commits

In Git, a branch (name) is merely a pointer to a single commit:

...--A--B--C--D   <-- branch1
         \
          E--F    <-- branch2

The name branch1 is just a pointer, remembering the raw hash ID of commit D. The name branch2 is also just a pointer, remembering the raw hash ID of commit F.

All commits have their own identity, but commits A and B are on both branches. Commits C and D are reachable only via branch1, commits E and F are reachable only via branch2, and commits A and B are reachable from both names. In Git, that's what it means for a commit to be "on a branch".

In Mercurial, things are very different. A branch (name) is a very solid entity, and we can draw this graph like this instead:

branch1: ...--A--B--C--D
                  \
branch2:           E--F

Here, commits A through D are on branch1. None of them are on branch2. They can never be on branch2. Commits E and F are on branch2 and they are forever stuck on branch2 and will never be on branch1. It's true that we can merge branch2 back into branch1, making commits E and F reachable—but in Mercurial, reachability has no effect on the grouping of commits into branches. Commits are made on a branch, and are forevermore glued to that branch.

This, of course, means that pruning and re-grafting commits in Mercurial makes it obvious that the commits are copied. The new copies are on the other branch: they're clearly different from the originals. The meaning of the phrase "commit X is on branch Y" is permanent and unchanging. A commit's identity depends on its branch.

In Git, however, the meaning of "commit X is on branch Y" is volatile. The commit is on the branch only as long as the branch label, which is a temporary and moveable thing, makes the commit reachable. Commits can be on many branches simultaneously, or even on no branch at all. A commit has identity independent of any branch label.

This enables Mercurial to have "multiple heads" within a branch

What Git calls a tip commit, Mercurial calls a head. Let me redraw your example, but Git-ified:

A--B--C   <-- tip1
 \
  D--E    <-- tip2
   \
    F     <-- tip3

There are three tip commits here, hence three branches, named tip1 through tip3. These identify commits C, E, and F.

Mercurial sticks commits into branches. This allows us to have a fork within a branch. I can't "do" color, but assume the first line A--B--C is in, say, yellow and the remaining lines are in green, denoting which commits are on which branches:

branch1:  A--B--C
           \
branch2:    D--E
  ...        \
branch2:      F

Here, branch2 contains both commits E and F, even though they are different heads (what Git would call "tips"). This situation is impossible in Git, because two different tip commits need two different branch-names to point to them. You can't draw one single arrow, coming from the right, that points to both E and F, which are all in the "green zone" (branch2).

Addendum (per edit): even "intra-branch", Hg has more information than Git

Even if all the commits are on one branch, Mercurial's internals give it a direct ability that Git lacks. We can point to a commit (by hash ID, sequential number, or Mercurial bookmark) and ask for "all descendant heads that are in this branch". Those are all the head commits whose branch is the current branch (and whose sequence number is greater, although that's an optimization) for which the given commit is an ancestor. (Usually we consider a commit its own descendant and ancestor, and we would here too.) This gives us (or hg) a (fast) way to find all "interesting" heads, and hence all the commits to rebase.

Git's commits have no equivalent: it's impossible to tell, in general, which commits are descendants of some commit. Instead, we can only tell which commits are ancestors, by following internal commit IDs backwards (from commits to their parents). The closest we can get to Mercurial's ability is to say "starting from some given branch tip(s) and working backwards, see which branch-tips have this commit as their ancestor; use all those branch-tips." (Of course, --branches would suffice here, but that is something git rebase doesn't do. It's also pretty slow.)

How to get what you want in Git

Because Git does not have multiple heads, and Git's branches are so ephemeral, we must start with our Git-specific drawing with three branches named tip1 through tip3. We can then rebase either tip2 or tip3: the choice is arbitrary.

Just as in Mercurial, rebasing means copying. Let's rebase tip2 to get D' and E'. We start with this, which I've redrawn a bit to leave some more room:

A--B--C    <-- tip1
 \
  \
   \
    D--E   <-- tip2
     \
      F    <-- tip3

Now we run:

$ git checkout tip2 && git rebase tip1

This first gets us on branch tip2, as git status would say, so that our rebase will affect branch-pointer tip2. Then, it instructs Git to find commits that are reachable from the current branch (tip2) but not reachable from the given branch tip1. These are commits D and E. Then, Git should copy these two commits, with the copies placed after the --onto argument.

We didn't give an --onto argument but it defaults to the argument we did give, which is tip1; and tip1 points to commit C. So the copies are placed after C. The last step of the rebase is to abandon the original chain of commits (though ORIG_HEAD and the reflog for tip2 will remember them for a while) and make the current branch, i.e., tip2, point to the final copied commit, i.e., E':

A--B--C        <-- tip1
 \     \
  \     D'-E'  <-- tip2
   \
    D--E       [ORIG_HEAD]
     \
      F        <-- tip3

We're halfway done. Now we hit the hard part: we need to rebase tip3 as well. We want our new copy of F' to come after commit D'. This means we must find the ID of D'.

Finding this ID is a bit tricky. In this case, it's easy enough: it's the parent commit of the new E', and tip2 points to E', so we just need to name the parent of tip2, for which any of these syntaxes work:

tip2^     # equivalent to tip2^1
tip2^1    # the first (and only) parent of the commit found via tip2
tip2~     # equivalent to tip2~1
tip2~1    # the commit found by moving one first-parent step back

(The difference between the ^ and ~ syntax is useful when you are crossing merge commits, which have more than one parent. If we wanted to move more commits back in a longer chain, we could repeat ^ many times, e.g., foo^^^^^, or use the ~ syntax: foo~5. For cases like this, use whichever one you find easier to type.)

The naïve attempt here—which usually works—is simply to run:

$ git checkout tip3 && git rebase tip2^   # I find ^ easier to type

This finds commits reachable from tip3 but not from tip2^. Those are, of course, commit F itself, and—uh oh—commit D. Note that when we start from tip2^ and work backwards, we go from D' to C to B. So this rebase will copy both D and F, rather than just copying commit F. The copies will go after D', which is the commit we identified by writing tip2^.

This seems like a disaster: Won't we get a new copy D''? And sometimes we will, and that is a (small) disaster. But when git rebase is doing its copies, it first checks to see if the commit it's copying—which, remember, is D—has a copy in the list of commits it should skip.

The list of commits it should skip is D' (tip2^) and C (tip2^^ or tip2~2). And, what do you know, D' is a copy of D. As long as git rebase can figure this out, it skips copying D after all. The result is:

A--B--C        <-- tip1
 \     \
  \     D'-E'  <-- tip2
   \     \
    D     F'   <-- tip3
     \
      F        [ORIG_HEAD]

(What happened to E here? The answer is: I'm not drawing in any of the reflog entries. Normally git log skips them, so I am skipping them too. I am only including ORIG_HEAD, the special name that git rebase leaves behind. The old ORIG_HEAD pointed to E, but the new rebase over-wrote it, so now we only see commit F—and even then, only if we use git log --all.)

Now, there are cases when git rebase can't figure out that D got copied to D'. Specifically, these occur if the first rebase—the one that made tip2 point to E'—had a conflict that you had to manually resolve while copying D.

In this case, you need a smarter git rebase command, instead of the naïve version. This is when you need the --onto argument:

$ git checkout tip3 && git rebase --onto tip2^ tip3^

This git rebase takes two parameters:

  • A set of commits to exclude: that's tip3^, i.e., commit D and everything earlier.
  • A place to start copying after: that's tip2^, i.e., commit D'.

This tells git rebase to copy commit F, but not D or anything before D, and place the copies after D'.

It would be nice if we could tell git rebase to do multiple branch-tips, but we can't. It would be nice if we could tell Git to figure out where D and D' are automatically, and here we have a bit more luck, but it's still tricky. A few years ago, I started to write some code along these lines, but I abandoned the effort when I was getting too little gain for too much pain. The cases I really cared about were already being handled by the copy-detection during a naïve-style git rebase.

like image 118
torek Avatar answered Sep 18 '22 23:09

torek