Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to make existing branch an orphan in git

Is there a way to make an existing branch an orphan in git?

git checkout --orphan seems to only create a new orphan?

like image 633
LazerSharks Avatar asked May 17 '15 05:05

LazerSharks


People also ask

Can I create an empty branch in git?

November 2021 Update: As of git version 2.27, you can now use git switch --orphan <new branch> to create an empty branch with no history. Unlike git checkout --orphan <new branch> , this branch won't have any files from your current branch (save for those which git doesn't track).

How do I delete an orphan branch in git?

Use git prune to remove orphaned/unused branches If you see any branches in there that you don't want, you can use the command git branch -d <branch name> . If you want to delete the branch called badbranch, use the -D switch to force the deletion if it doesn't work: git branch -d badbranch .

What is git switch command?

The "switch" command allows you to switch your current HEAD branch. It's relatively new (added in Git v2. 23) and provides a simpler alternative to the classic "checkout" command. Before "switch" was available, changing branches had to be done with the "checkout" command.


1 Answers

You are correct that git checkout --orphan creates only new orphan branches. The trick is that this process leaves the index undisturbed. Thus, Nick Volynkin's answer will work, as long as your Git is not too ancient.

If you want to keep the original commit message, you can replace his:

$ git commit -m'first commit in orphan'

with:

$ git commit -C master~2

If your Git is sufficiently old that you do not have git checkout --orphan, this should also do it:

$ commit=<hash>  # or, e.g., commit=$(git rev-parse master~2)
$ git branch newbranch $( \
    git log --no-walk --pretty=format:%B $commit | \
    git commit-tree -F - "${commit}^{tree}" \
)
$ git checkout newbranch
$ git cherry-pick $commit..master # may want -x; see below

where you choose the starting-point from git log or using the ~ syntax with an existing branch name (this continues to use master~2 as in Nick's answer).

If all you wanted was a recipe, that should do the trick, but if you want to know what's going on and why this works (and when it doesn't), read on. :-)

Things you need to know about branches

Before we go any further seems like a good idea to define some items and describe what's going on.

Branch names vs the commit graph

First, let's make a clear distinction between a branch name, like master or newbr, and various portions of the commit graph. A branch name simply points to one commit, designated a tip commit or branch tip, within the graph:

*--o--o---o--o    <-- master
    \    /
     o--o--o--o   <-- brA
            \
             o    <-- brB

This graph has three branch tips, pointed-to by master, brA, and brB. The ancestry of the tip of brB, for instance, goes back in a wiggly line, always moving leftward and sometimes up too, to the (single) root commit * (as distinguished from all the other non-root o commits). The fact that commit * has no commits to its left—no parent commit to point-to—is what makes it a root commit.

This root commit is on all the branches. Other commits are on multiple branches as well. There's a merge commit on master, which brings commits from brA in, even though brA then has two commits that master does not, for instance. To follow master back to the root, you must go straight left, and also down-and-left at the merge, and then back up-and-left where brA splits off.

Note that we can have multiple branch names pointing to a single commit, or branch names that point to "tip" commits that are embedded within another branch:

*--o--o---o--o    <-- master
    \    /
     o--o--o      <-- brA
            \
             o    <-- brB, brC

Here we've "rewound" branch brA by one commit, so that the right-side middle-row commit is the tip of brA, even though it's one commit back from the tip of brB. We've added a new branch, brC, that points to the same commit as brB (making it a tip twice, as it were; let's hope this commit is not a tip in the British-English "rubbish tip" sense of the word: "ugh, this commit is an absolute tip!").

The DAG

The graph has a series of nodes o, each of which point to some parent(s) that are generally on their left. The lines (or arrows, really) connecting the nodes are directed edges: one-way streets or rail-lines, if you will, connecting child nodes in the graph back to their parents.

The nodes, plus the directed edge links from child to parent, form the commit graph. Because this graph is directed (child to parent) and acyclic (once you depart a node, you can never come back to it), this is called a Directed Acyclic Graph or DAG. DAGs have all kinds of nice theoretical properties, most of which we can ignore for this SO answer.

DAGs may have disconnected subgraphs

Now let's consider this alternative graph:

*--o--o---o--o   <-- master
    \    /
     o--o--o     <-- brA

*--o--o--o       <-- orph

This new branch, whose tip is named orph, has its own root and is completely disconnected from the other two branches.

Note that multiple roots are a necessary precondition for having (non-empty) disjoint sub-graphs, but depending on how you want to view these graphs, they may not be sufficient. If we were to merge (the tip commit of) brA into orph1 we would get this:

*--o--o---o--o   <-- master
    \    /
     o--o--o     <-- brA
            \
*--o--o--o---o   <-- orph

and the two "graphs fragments" are now joined. However, there exist sub-graphs (such as those starting from orph^1 and brA, the two parents of orph) that are disjoint. (This is not particularly relevant to creating orphan branches, it's just something you should understand about them.)


1Modern Git rejects a casual attempt to do such a merge, since the two branches have no merge base. Older versions of Git do the merge, not necessarily with sensible results.


git checkout --orphan

The orph branch is the kind of branch that git checkout --orphan makes: a branch that will have a new, disconnected root.

The way it gets there is to make a branch name that points to no commit at all. Git calls this an "unborn branch", and branches in this state have only a sort of half-existence, because Git leaks the implementation through.

Unborn branches

A branch name, by definition, always points to the tip-most commit on that branch. But this leaves Git with a problem, especially in a totally fresh new repository that has no commits at all: where can master point?

The fact is that an unborn branch can't point anywhere, and because Git implements branch names by recording them as a <name, commit-ID> pair,2 it simply cannot record the branch until there is a commit. Git's solution to this dilemma is to cheat: the branch name does not go into the branch records at all, but instead, only into the HEAD record.

The HEAD, in Git, records the current branch name. For "detached HEAD" mode, HEAD records the actual commit ID—and in fact, this is how Git determines whether a repository / work-tree is in "detached HEAD" mode: if its HEAD file contains a branch name, it is not detached, and if it contains a commit ID, it is detached. (No other states are permitted.)

Hence, to create an "orphan branch", or during that awkward period when there is no commit yet for master, Git stores the name in HEAD, but does not actually create the branch name yet. (That is, there is no entry in .git/refs/heads/ for it, and no line in .git/packed-refs for it.)

As a peculiar side effect, this means that you can only have one unborn branch. The unborn branch's name is stored in HEAD. Checking out another branch, with or without --orphan, or any commit by ID—any action that updates HEAD—wipes out all traces of the unborn branch. (A new git checkout --orphan, of course, replaces it with the trace of the new unborn branch.)

Once you make a first commit, the new branch springs into being, because...


2With "unpacked" references, the name is just a path in the file system: .git/refs/heads/master. The commit-ID is then simply the contents of this file. Packed references are stored differently, and Git is evolving other ways to handle the name-to-ID mapping, but this is the most basic and it's currently still needed to allow Git to work.

There are two obvious ways to keep unborn branches around, but Git uses neither of them. (For the record, these are: create an empty file, or use the special "null hash". The empty file trick has an obvious flaw: it will be very fragile in the face of command or computer crashes, much more so than using the null hash.)


The commit process

In general, the process of making a new commit, in Git, goes like this:

  1. Update and/or populate the index, also called the staging area or cache: git add various files. This step creates Git's blob objects, which store the actual file contents.

  2. Write the index into one or more tree objects (git write-tree). This step creates, or in a few rare cases reuses, at least one (top level) tree. That tree has entries for each file and sub-directory; for files, it lists the blob-ID, and for sub-directories, it lists (after creating) the tree that contains the sub-directory's files and trees. Note, incidentally, that this leaves the index undisturbed, ready for the next commit.

  3. Write a commit object (git commit-tree). This step needs a bunch of items. For our purposes the main interesting ones are the (single) tree object that goes with this commit—that's the one we just got from step 2—and a list of parent commit IDs.

  4. Write the new commit's ID into the current branch name.

Step 4 is how and why branch names always point to the tip commit. The git commit command gets the branch name from HEAD. It also, during step 3, gets the primary (or first, and usually only) parent commit ID the same way: it reads HEAD to get the branch name, then reads the tip commit ID from the branch. (For merge commits, it reads the additional parent IDs—usually just one—from MERGE_HEAD.)

Git's commit knows, of course, about unborn and/or orphan branches. If HEAD says refs/heads/master, but branch master does not exist ... well, then, master must be an unborn branch! So this new commit has no parent ID. It still has the same tree as always, but it is a new root commit. It still gets its ID written into the branch file, which has the side effect of creating the branch.

Hence, it's making the first commit on the new orphan branch that actually creates the branch.

Things you need to know about cherry-pick

Git's cherry-pick command is very simple in theory (practice gets a bit complicated sometimes). Let's go back to our example graphs and illustrate a typical cherry-pick operation. This time, in order to talk about some of the specific commits within the graph, I'll give them single-name letters:

...--o--o--A--B--C   <-- mong
      \
       o--o          <-- oose

Let's say we'd like to cherry-pick commit B from branch mong into branch oose. That's easy, we just do:

$ git checkout oose; git cherry-pick mong~1

where mong~1 designates commit B. (This works because mong designates commit C, and C's parent is B, and mong~1 means "move back one parent commit along the main-line of first-parent links. Likewise mong~2 designates commit A, and mong~3 designates the o just before A, and so on. As long as we don't traverse a merge commit, which has multiple parents, everything is very simple here.)

But how does git cherry-pick actually work? The answer is: it first runs git diff. That is, it constructs a patch, of the sort shown by git log -p or git show.

Commits have complete trees

Remember (from our earlier discussion) that each commit has an attached tree object. That tree holds the entire source tree as of that commit: a snapshot of everything that was in the index/staging-area when we made that commit.

This means that commit B has a whole complete work-tree associated with it. But we want to cherry-pick the changes we made in B, not the tree of B. That is, if we changed README.txt, we want to get the change we made: not the old version of README.txt, and not the new version, just the changes.

The way we find this is that we go from commit B back to its parent, which is commit A. Commit A also has a whole complete work-tree. We just run git diff on the two commits, which shows us what we changed in README.txt, along with any other changes we made.

Now that we have the diff / patch, we go back to where we are now—the tip commit of branch oose, and the files we have in our work-tree and in our index/staging-area that correspond to that commit. (The git cherry-pick command will, by default, refuse to run at all if our index does not match our work-tree, so we know they are the same.) Now Git simply applies (as with git apply) the patch we just obtained by diffing commits A and B.

Hence, whatever changes we made to go from A to B, we make those now, to our current commit / index / work-tree. If all goes well, this gives us modified files, which Git automatically git adds to our index; and then Git runs git commit to make a new commit, using the log message from commit B. If we ran git cherry-pick -x, Git adds the phrase "cherry-picked from ..." to our new commit's log message.

(Hint: you usually want to use -x. It probably should be the default. The main exception is when you are not going to keep the original commit you've just cherry-picked. One can also argue that using cherry-pick is usually wrong—it's an indication that you did something wrong earlier, really, and now have to paper over it, and the papering-over may not hold up in the long run—but that's for another [long] posting entirely.)

Cherry-picking in an orphan branch

VonC noted that in Git 2.9.1 and later, git cherry-pick works in an orphan branch; in an upcoming release it works for sequences as well as for individual commits. But there is a reason this has been impossible for so long.

Remember, cherry-pick turns a tree into a patch, by diffing a commit against its parent (or, in the case of a merge commit, the parent you choose with the -m option). It then applies that patch to the current commit. But an orphan branch—a branch we have not yet made—has no commits, hence no current commit, and—at least in a philosophical sense—no index and no work-tree. There is simply nothing to patch.

In fact, though, we can (and Git now does) just bypass this entirely. If we ever had a current commit—if we had something checked out at some point—then we still have, right now, an index and a work-tree, left-over from the most recent "current commit", whenever we had it.

This is what git checkout --orphan orphanbranch does. You check out some existing commit and hence populate the index and work-tree. Then you git checkout --orphan newbranch and git commit and the new commit uses the current index to create—or actually, reuse—a tree. That tree is the same tree as the commit you had checked out before you did git checkout --orphan orphanbranch.3

This is where the main part of my recipe for very-old-Git comes from as well:

$ commit=$(git rev-parse master~2)
$ git branch newbranch $( \
    git log --no-walk --pretty=format:%B $commit | \
    git commit-tree -F - "${commit}^{tree}" \
)
$ git checkout newbranch

First we find the desired commit and its tree: the tree associated with master~2. (We don't actually need the variable commit, but writing it out like this lets us cut-and-paste a hash from git log output, without having to count how far back it is from the tip of master or whichever branch we are going to use here.)

Using ${commit}^{tree} tells Git to find the actual tree associated with the commit (this is standard gitrevisions syntax). The git commit-tree command writes a new commit into the repository, using this tree we just supplied. The parent(s) of the new commit come from the parent IDs we supply using -p options: we use none, so the new commit has no parents, i.e., is a root commit.

The log message for this new commit is whatever we supply on standard input. To get this log message, we use git log --no-walk --pretty=format:%B, which just prints the full text of the message to standard output.

The git commit-tree command produces as its output the ID of the new commit:

$ ... | git commit-tree "master~2^{tree}"
80c40c288811ebc44e0c26a5b305e5b13e8f8985

(each run produces a different ID unless all are run in the same one-second period, since each one has a different set of time-stamps; the actual ID is not terribly important here). We give this ID to git branch to make a new branch name that points to this new root commit, as its tip commit.

Once we have the new root commit on a new branch, we can git checkout the new branch, and we're ready to cherry-pick the remaining commits.


3In fact, you can combine these as usual:

git checkout --orphan orphanbranch master~2

which first checks out (puts into the index and work-tree) the contents of the commit identified by master~2, then sets up HEAD so that you are on the unborn branch orphanbranch.


Using git cherry-pick into an orphan branch is not as useful as we might like

I have here a newer version of Git built (it fails some of its own tests—dies during t3404-rebase-interactive.sh—but otherwise seems mostly OK):

$ alias git=$HOME/.../git
$ git --version
git version 2.9.2.370.g27834f4

Let's check out, with --orphan, master~2 with new name orphanbranch:

$ git checkout --orphan orphanbranch master~2
Switched to a new branch 'orphanbranch'
$ git status
On branch orphanbranch

Initial commit

Changes to be committed:
  (use "git rm --cached <file>..." to unstage)

    new file:   .gitignore
    new file:   a.py
    new file:   ast_ex.py
[snip]

Because this is a new branch, it looks to Git as though everything is new. If I now try to git cherry-pick either master~2 or master~1:

$ git cherry-pick master~2
error: Your local changes would be overwritten by cherry-pick.
hint: Commit your changes or stash them to proceed.
fatal: cherry-pick failed
$ git cherry-pick master~1
error: Your local changes would be overwritten by cherry-pick.
hint: Commit your changes or stash them to proceed.
fatal: cherry-pick failed

What I would have to do is clean everything out, in which case applying the change from master~3 to master~2 would be unlikely to work, or just do an initial git commit anyway, to make a new root commit based on the tree from master~2.

Conclusion

If you have git checkout --orphan, just use that to check out the target commit oldbranch~N (or by hash ID, which you can cut-and-paste from git log output):

$ git checkout --orphan newbranch oldbranch~N

then make the new commit immediately, as Nick Volynkin said (you can copy its message):

$ git commit -C oldbranch~N

so as to create the branch; and then use git cherry-pick with oldbranch~N..oldbranch to get the remaining commits:

$ git cherry-pick oldbranch~N..oldbranch

(And maybe use -x, depending on whether you plan to strip the commits from oldbranch.) Remember, oldbranch~N..oldbranch excludes the commit oldbranch~N itself, but that's actually good because that's the one we made as the new root commit.

like image 145
torek Avatar answered Sep 19 '22 21:09

torek