Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to change the starting point of a branch?

Usually I create a branch by running a command like git checkout -b [branch-name] [starting-branch]. In one case, I forgot to include the starting-branch, and now I want to correct it. How do I do so after the branch has already been created?

like image 670
Daniel Kobe Avatar asked Jul 17 '16 23:07

Daniel Kobe


People also ask

How do I modify a branch?

Git Branch Rename Command The steps to change a git branch name are: Rename the Git branch locally with the git branch -m new-branch-name command. Push the new branch to your GitHub or GitLab repo. Delete the branch with the old name from your remote repo.


1 Answers

The short answer is that once you have some commits, you want to git rebase them, using the long form of git rebase: git rebase --onto newbase upstream. To find out how to identify each of these, see the (very) long answer below. (Unfortunately, it got a bit out of hand and I do not have time to shorten it.)

The problem you have here is that in Git, branches don't have a "starting point"—at least, not in any useful way.

The term "branch", in Git, is ambiguous

The first issue here is that, in Git, the word "branch" has at least two distinct meanings. Usually, when we talk loosely about "the branch", it's clear from context whether we mean the branch name—the thing that's a word like master or develop or feature-X—or the thing that I call "branch ancestry" or "branch structure", or more informally, a "DAGlet".1 See also What exactly do we mean by "branch"?

In this particular case, unfortunately, you mean both of these, at the same time.


1The term DAG is short for Directed Acyclic Graph, which is what the commit graph is: a set of vertices or nodes, and directional (from child to parent) edges, such that there are no cycles through the directed edges from any node back to itself. To this I simply add the "-let" diminutive suffix. The resulting word has a happy resemblance to the word aglet, plus a certain assonance with the word "dagger", making it sound slightly dangerous: "Is this a DAGlet which I see before me?"

Draw your commit graph

Whenever you need to grapple with these issues, it helps to draw a graph of what you have now, or at least some useful subset of what you have now. There are of course many ways to draw this (see that linked question for several options, including some bad ones :-) ), but in plain text in a StackOverflow answer, I generally draw them like this:

...--o--o--o           <-- master          \           o--o--o--o   <-- develop 

The round o nodes represent commits, and the branch names master and develop point to one specific tip commit on each branch.

In Git, every commit points back to its parent commit(s), and this is how Git forms branch structures. By "branch structures", I mean here particular subsets of the overall ancestry part of the graph, or what I call the DAGlets. The name master points to the tip-most commit of the master branch, and that commit points back (leftward) to another commit that is the previous commit on the branch, and that commit points leftward again, and so on.

When we need to talk about specific commits within this graph, we can use their actual names, which are the big ugly 40-character hashes that identify each Git object. Those are really clumsy though, so what I do here is replace the little round os with uppercase letters:

...--A--B--C           <-- master          \           D--E--F--G   <-- develop 

and now it's easy to say, e.g., that the name master points to commit C, and C points to B, and B points to A, which points back to more history that we don't really care about and hence just left as ....

Where does a branch begin?

Now, it's perfectly obvious, to you and me, based on this graph drawing, that branch develop, whose tip commit is G, starts at commit D. But it's not obvious to Git—and if we draw the same graph a little differently, it may be less obvious to you and me too. For instance, look at this drawing:

          o             <-- X          / ...--o--o--o--o--o--o   <-- Y 

Obviously branch X has just the one commit and the main line is Y, right? But let's put some letters in:

          C             <-- X          / ...--A--B--D--E--F--G   <-- Y 

and then move Y down a line:

          C            <-- X          / ...--A--B          \           D--E--F--G   <-- Y 

and then look what happens if we move C down to the main line, and realize that X is master and Y is develop? Which branch is commit B on after all?

In Git, commits may be on many branches simultaneously; DAGlets are up to you

Git's answer to this dilemma is that commits A and B are on both branches. The beginning of branch X is way off to the left, in the ... part. But so is the beginning of branch Y. As far as Git is concerned, a branch "starts" at whatever root commit(s) it can find in the graph.

This is important to keep in mind in general. Git has no real concept of where a branch "started", so we wind up having to give it extra information. Sometimes that information is implied, and sometimes it is explicit. It's also important, in general, to remember that commits are often on many branches—so instead of specifying branches, we usually specify commits.

We just often use branch names to do this. But if we give Git just a branch name, and tell it to find all the ancestors of the tip commit of that branch, Git goes all the way back in history.

In your case, if you write the name develop and ask Git to select that commit and its ancestors, you get commits D-E-F-G (which you wanted) and commit B, and commit A, and so on (which you didn't). The trick, then, is to somehow identify which commits you don't want, along with which commits you do.

Normally we use the two-dot X..Y syntax

With most Git commands, when we want to select some particular DAGlet, we use the two-dot syntax described in gitrevisions, such as master..develop. Most2 Git commands that work on multiple commits treat this as: "Select all commits starting from the tip of the develop branch, but then subtract from that set, the set of all commits starting from the tip of the master branch." Look back at our graph drawing of master and develop: this says "do take commits starting from G and working backwards"—which gets us too many, since it includes commits B and A and earlier—"but exclude commits starting from C and working backwards." It's that exclude part that gets us what we want.

Hence, writing master..develop is how we name commits D-E-F-G, and have Git compute that automatically for us, without having to first sit down and draw out a big chunk of the graph.


2Two notable exceptions are git rebase, which is in its own section just below, and git diff. The git diff command treats X..Y as simply meaning X Y, i.e., it effectively just ignores the two dots entirely. Note that this has a very different effect than set subtraction: in our case, git diff master..develop simply diffs the tree for commit C against the tree for commit G, even though master..develop never has commit C in the first set.

In other words, mathematically speaking, master..develop is normally ancestors(develop) - ancestors(master), where the ancestors function includes the specified commit, i.e., is testing ≤ rather than just <. Note that ancestors(develop) does not include commit C at all. The set subtraction operation simply ignores the presence of C in the set ancestors(master). But when you feed this to git diff, it does not ignore C: it does not diff, say, B against G. While that might be a reasonable thing to do, git diff instead steals the three-dot master...develop syntax to accomplish this.

Git's rebase is a little bit special

The rebase command is almost always used to move3 one of these DAGlet commit-subsets from one point in the graph to another. In fact, that's what rebase is, or was originally anyway, defined to do. (Now it has a fancy interactive rebase mode, which does this and a bunch more history editing operations. Mercurial has a similar command, hg histedit, with a slightly better name, and much tighter default semantics.4)

Since we always (or almost always) want to move a DAGlet, git rebase builds in this subset selection for us. And, since we always (or almost always) want to move the DAGlet to come just after the tip of some other branch, git rebase defaults to choosing the target (or --onto) commit using a branch name, and then uses that same branch name in the X..Y syntax.5


3Technically, git rebase actually copies commits, rather than moving them. It has to, because commits are immutable, like all Git's internal objects. The true name, the SHA-1 hash, of a commit is a checksum of the bits making up the commit, so any time you change anything—including something as simple as the parent ID—you have to make a new, slightly-different, commit.

4In Mercurial, quite unlike Git, branches really do have starting points, and—more important for histedit—commits record their phase: secret, draft, or published. History editing readily applies to secret or draft-phase commits, and not so much to published commits. This is true of Git as well, but since Git has no concept of commit phases, Git's rebase must use these other techniques.

5Technically the <upstream> and --onto arguments can just be raw commit IDs. Note that 1234567..develop works just fine as a range selector, and you can rebase --onto 1234567 to place the new commits after commit 1234567. The only place that git rebase truly needs a branch name is for the name of the current branch, which it normally just reads from HEAD anyway. However, we usually want to use a name, so that's how I describe it all here.


That is, if we're currently on branch develop, and in this situation that we drew before:

...--A--B--C           <-- master          \           D--E--F--G   <-- develop 

we probably just want to move the D-E-F-G chain onto the tip of master, to get this:

...--A--B--C              <-- master             \              D'-E'-F'-G'  <-- develop 

(The reason I changed the names from D-E-F-G to D'-E'-F'-G' is that rebase is forced to copy the original commits, rather than actually moving them. The new copies are just as good as the originals, and we can use the same single letter name, but we should at least note, however vaguely, that these are in fact copies. That's what the "prime" marks, the ' characters, are for.)

Because this is what we usually want, git rebase will do this automatically if we just name the other branch. That is, we're on develop now:

$ git checkout develop 

and we want to rebase commits that are on branch develop and are not on master, moving them to the tip of master. We might express this as git somecmd master..develop master, but then we would have to type the word master twice (such a dreadful fate). So instead, Git's rebase infers this when we just type in:

$ git rebase master 

The name master becomes the left side of the two-dot .. DAGlet selector, and the name master also becomes the target of the rebase; and Git then rebases D-E-F-G onto C. Git gets our branch's name, develop, by reading out the current branch name. In fact, it uses a shortcut, which is that when you need the current branch name, you can normally just write HEAD instead. So master..develop and master..HEAD mean the same thing, because HEAD is develop.

Git's rebase calls this name the <upstream>. That is, when we say git rebase master, Git claims, in the documentation, that master is the <upstream> argument to git rebase. The rebase command then operates on commits in <upstream>..HEAD, copying them after whatever commit is in <upstream>.

This is going to become a problem for us soon, but for now, just make note of it.

(Rebase also has the sneaky, but desirable, side feature of omitting any of the D-E-F-G commits that sufficiently resembles commit C. For our purposes we can ignore this.)

What's wrong with the other answer to this question

In case the other answer gets deleted, or becomes one of several other answers, I'll summarize it here as "use git branch -f to move the branch label." The flaw in the other answer—and, perhaps more importantly, precisely when it's a problem—becomes obvious once we draw our graph DAGlets.

Branch names are unique, but tip commits are not necessarily so

Let's take a look at what happens when you run git checkout -b newbranch starting-point. This asks Git to root around in the current graph for the given starting-point, and make the new branch label point to that specific commit. (I know I said above that branches don't have a starting point. This is still mostly true: we're giving the git checkout command a starting point now, but Git is about to set it and then, crucially, forget it.) Let's say that starting-point is another branch name, and let's draw a whole bunch of branches:

          o--o--o--o     <-- brA          / ...--o--o--o--o--o--o    <-- brB             \              o--o--o     <-- brC                  \                   o--o   <-- brD 

Since we have four branch names, we have four branch tips: four branch-tip commits, identified by the names brA through brD. We pick one and make a new branch name newbranch that points to the same commit as one of these four. I have arbitrarily picked brA here:

          o--o--o--o     <-- brA, newbranch          / ...--o--o--o--o--o--o    <-- brB             \              o--o--o     <-- brC                  \                   o--o   <-- brD 

We now have five names, and five ... er, four? ... well, some tip commits. The tricky bit is that brA and newbranch both point to the same tip commit.

Git knows—because git checkout sets it—that we're now on newbranch. Specifically Git writes the name newbranch into HEAD. We can make our drawing a bit more accurate by adding this information:

          o--o--o--o     <-- brA, HEAD -> newbranch          / ...--o--o--o--o--o--o    <-- brB             \              o--o--o     <-- brC                  \                   o--o   <-- brD 

At this point, the four commits that used to be only on branch brA are now on both brA and newbranch. And, by the same token, Git no longer knows that newbranch starts at the tip of brA. As far as Git is concerned, both brA and newbranch contain those four commits and all the earlier ones too, and both of them "start" way back in time somewhere.

When we make new commits, the current name moves

Since we're on branch newbranch, if we make a new commit now, the new commit's parent will be the old tip commit, and Git will adjust the branch name newbranch to point to the new commit:

                     o   <-- HEAD -> newbranch                     /           o--o--o--o     <-- brA          / ...--o--o--o--o--o--o    <-- brB             \              o--o--o     <-- brC                  \                   o--o   <-- brD 

Note that none of the other labels moved: the four "old" branches stay put, only the current (HEAD) branch changes. It changes to accommodate the new commit we just made.

Note that Git continues to have no idea that branch newbranch "started" at brA. It's just the case, now, that newbranch contains one commit that brA does not, plus the four commits that they both contain, plus all those earlier commits.

What git branch -f does

Using git branch -f lets us move a branch label. Let's say, for whatever mysterious reason, we don't want branch label brB to point where it does in our current drawing. Instead, we want it to point to the same commit as brC. We can use git branch -f to change the place to which brB points, i.e., to move the label:

$ git branch -f brB brC                       o   <-- HEAD -> newbranch                     /           o--o--o--o     <-- brA          / ...--o--o--o--o--o--o    [abandoned]             \              o--o--o     <-- brC, brB                  \                   o--o   <-- brD 

This makes Git "forget" or "abandon" those three commits that were only on brB before. That's probably a bad idea—why did we decide to do this strange thing?—so we probably want to put brB back.

Reflogs

Fortunately, "abandoned" commits are normally remembered in what Git calls reflogs. Reflogs use an extended syntax, name@{selector}. The selector part is usually either a number or date, such as brB@{1} or brB@{yesterday}. Every time Git updates a branch name to point to some commit, it writes a reflog entry for that branch, with the pointed-to commit's ID, a time-stamp, and an optional message. Run git reflog brB to see these. The git branch -f command wrote the new target as brB@{0}, bumping up all the old numbers, so now brB@{1} names the previous tip commit. So:

$ git branch -f brB 'brB@{1}'     # you may not need the quotes, 'brB@{...}' --     # I need them in my shell, otherwise the shell     # eats the braces.  Some shells do, some don't. 

will put it back (and also renumber all the numbers again: each update replaces the old @{0} and makes it @{1}, and @{1} becomes @{2}, and so on).

Anyway, suppose that we do our git checkout -b newbranch while we're on brC, and fail to mention brA. That is, we start with:

          o--o--o--o     <-- brA          / ...--o--o--o--o--o--o    <-- brB             \              o--o--o     <-- HEAD -> brC                  \                   o--o   <-- brD 

and run git checkout -b newbranch. Then we get this:

          o--o--o--o     <-- brA          / ...--o--o--o--o--o--o    <-- brB             \              o--o--o     <-- brC, HEAD -> newbranch                  \                   o--o   <-- brD 

If we meant to make newbranch point to commit brA, we can in fact do that right now, with git branch -f. But let's say we make a new commit before realizing that we made newbranch start at the wrong point. Let's draw it in:

          o--o--o--o     <-- brA          / ...--o--o--o--o--o--o    <-- brB             \              o--o--o     <-- brC                  \  \                  |   o   <-- HEAD -> newbranch                  \                   o--o   <-- brD 

If we use git branch -f now, we'll abandon—lose—the commit we just made. What we want instead is to rebase it, onto the commit that branch brA points-to.

A simple git rebase copies too much

What if, instead of using git branch -f, we use git rebase brA? Let's analyze this using—what else—our DAGlets. We start with the above drawing above, with the extended leg going out to brD, though in the end we get to ignore that leg, and with the section going to brB, most of which we also get to ignore. What we don't get to ignore is all that stuff in the middle, that we get by tracing the lines back.

The git rebase command, in this form, will use brA..newbranch to pick commits to copy. So, starting with the whole DAGlet, let's mark (with *) all the commits that are on (or contained in) newbranch:

          o--o--o--o     <-- brA          / ...--*--*--*--o--o--o    <-- brB             \              *--*--*     <-- brC                  \  \                  |   *   <-- HEAD -> newbranch                  \                   o--o   <-- brD 

Now, let's un-mark (with x) all the commits that are on (or contained in) brA:

          x--x--x--x     <-- brA          / ...--x--x--*--o--o--o    <-- brB             \              *--*--*     <-- brC                  \  \                  |   *   <-- HEAD -> newbranch                  \                   o--o   <-- brD 

Whatever remains—all the * commits—are the ones that git rebase will copy. That's way too many!

We need to get git rebase to copy just the one commit. What this means is that for the <upstream> argument, we need to give git rebase the name brC.6 That way, Git will use brC..HEAD to select the commits to copy, which will be just the one commit we need to copy.

But—alas!—now we have a big problem, because git rebase wants to copy the commit to a point right after the <upstream> we just gave it. That is, it wants to copy the commits to just after brC. That's where the commits are now! (Well, the one commit is.) So this is no good at all!

Fortunately, git rebase has an escape hatch, specifically the --onto argument. I mentioned this several times before, but now is when we need it. We want the copies to go right after brA, so that's what we will supply as the --onto argument. Git's rebase uses the <upstream> by default, but if we give it an --onto, it uses that instead. So:

$ git branch   # just checking...   brA   brB   brC   brD   master * newbranch 

OK, good, we're still on newbranch. (Note that git status works here too, and if you use one of those fancy shell prompt setup things, you can even get your current branch name to be in your prompt, so that you don't need to run git status as often.)

$ git rebase --onto brA brC 

Now Git will select commits in brC..HEAD, which is the right set of commits to copy, and copy them right after the tip of brA, which is the right place to copy them to. Once the copies are all done, Git will abandon the original commits7 and make the name newbranch point to the new, tip-most, copied commit.

Note that this works even if you have no new commits on the new branch. This is the one case where git branch -f also works. When there are no commits, this git rebase carefully copies all zero of them :-) and then makes the name, newbranch, point to the same commit as brA. Hence git branch -f is not always wrong; but git rebase is always right—albeit somewhat clumsy: you must identify both the <upstream> and the --onto points manually.


6Or, as we noted in an earlier footnote, we can give git rebase the ID of the commit to which the name brC points. Either way, we have to supply this as the upstream argument.

7Except, of course, reflog entry newbranch@{1} will remember the old, now-abandoned, tip commit. Additional reflog entries for newbranch may remember yet more commits, and remembering the tip commit suffices to keep all its ancestors alive. The reflog entries eventually expire—after 30 days for some cases, and 90 for others, by default—but this gives you up to a month or so, by default, to recover from mistakes.

like image 196
torek Avatar answered Sep 20 '22 13:09

torek