Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can you remove Git commits from a branch that exist in another branch like an "intersection" of commits?

The simple situation I find myself in at times, where I am working on some changes and I create a branch. As I move along in my changes I start finding some things that need cleanup or some other partially related thing to change that I want to start getting into. So, I want to keep the branches specific, so I quickly spin off another branch to start working on this other set of changes that may not have the same dependencies as the previous branch. In the end, I end up with 2 branches where I tried to isolate the changes, however, this 2nd branch originated from the first branch, which in term came from 'master'.

I can (and have) updated each branch individually by merging 'master' into each of them, and want to position this 2nd branch ready to be merged into 'master' because it has less drastic dependencies than the 1st branch created. However, this branch also contains changes done in the 1st branch since it was spun off of it.

So I'm wondering, is there a way to tell Git something like: "Remove all the commits that exist in this other branch" So that I'm left with my 2nd branch without all the changes done in the 1st branch, allowing me to merge this 2nd branch into 'master', and let me go back to work on the 1st branch I created.

It's possible that I'm just not finding the right terminology in Git to see how it can already do that. But also, maybe it can't. It would seem like it should be very doable though, seeing how Git is great at showing me only the appropriate diffs between branch 1 and 2 even after I individually update both branches from 'master'.

And "removing" from the branch isn't necessary.. even if the idea is creating yet another branch but that is still somehow excluding the changes that were done in that 1st branch that also are in the 2nd branch would be sufficient.

like image 685
user2415376 Avatar asked Mar 18 '17 17:03

user2415376


People also ask

How do I remove a commit from the middle branch?

Deleting the "Middle" Commit from the History. All you need to do is typing "drop" at the beginning of each commit you want to delete. Be careful while using the git rebase command, as it may cause sudden problems. So, it is more recommended to use the git revert command.

How do I remove unwanted commits from a branch?

To remove the last commit from git, you can simply run git reset --hard HEAD^ If you are removing multiple commits from the top, you can run git reset --hard HEAD~2 to remove the last two commits. You can increase the number to remove even more commits.

Can a commit be taken from one branch to another?

You can cherry-pick a commit on one branch to create a copy of the commit with the same changes on another branch. If you commit changes to the wrong branch or want to make the same changes to another branch, you can cherry-pick the commit to apply the changes to another branch.

Should you delete a branch after merging?

When you're done with a branch and it has been merged into master, delete it. A new branch can be made off of the most recent commit on the master branch. Also, while it is ok to hang onto branches after you've merged them into the master they will begin to pile up.


2 Answers

Yes, you can do this. In some cases it's even ridiculously easy, as it's just done automatically by git rebase. In some cases it's ridiculously hard. Let's take a look at the cases.

First, it's crucial, as it almost always is in Git, to draw the commit graph. To get there, let's start with reviewing Git basics. (This is a good idea since a lot of Git tutorials skip right over the basics, as the basics are boring and confusing. :-) ) First, let's look at what a commit is and does for you.

What a commit is and does for you

A commit, in Git, is a totally concrete thing. We can look at any actual commit—most of them are pretty small—not with git show, which fancies them up a lot, but with git cat-file -p, which shows the immediate, raw contents (well, tree objects require minor tweaking, so sometimes "mostly raw") of an actual Git object:

$ git cat-file -p 3bc53220cb2dcf709f7a027a3f526befd021d858
tree 5654dad720d5b0a8177537390575cd6171c5fc50
parent 3e5c63943d35be1804d302c0393affc4916c3dc3
author Junio C Hamano <[email protected]> 1488233064 -0800
committer Junio C Hamano <[email protected]> 1488233064 -0800

First batch after 2.12

Signed-off-by: Junio C Hamano <[email protected]>

That's one whole commit right there. Its name—the one name that identifies that commit, from now until forever—is 3bc5322... (some big ugly hash ID that humans never want to deal with if they can avoid it). It stores several more big ugly hash IDs. One is for a tree, and some number—usually just one, again—are for parents. It has an author (name, email address, and time-stamp) and committer, who are usually the same; and it has a log message, which is whatever you want to write.

The tree attached to a commit is a source-tree snapshot. It's the whole thing, not a set of changes. (Underneath, Git does get clever with compression, but the hash ID of the tree gets you hash IDs of files, and those files are the complete files, not some weird compressed thing.) Have Git extract that tree, and you get all the files.

Because each commit stores a parent hash ID, we can start with a recent commit and work backwards. That's Git for you: backwards. We start with the hash ID of the most recent commit, which we have Git save for us in a branch name. We say that this branch name points to the commit:

<--C   <--master

The name master points to commit C. (I'm using one letter names instead of big ugly hash IDs, which limits me to 26 commits but is a lot more convenient.) Commit C has another hash ID in it though, so C points to another commit. That's C's parent, B. B of course also points to another commit, but let's say our repository only has three commits total, so that B points back to A but A was the first commit.

Since A was the first, it cannot have a parent. So it doesn't: it does not point back any further. We call A a root commit. Every repository has at least one (and usually only one) root commit.1 That's where the action has to stop: we (or Git) can't go back any further.

In any case, commits, once made, are permanent and unchanging.2 This is because their hash ID is made by computing a cryptographic hash of all of the bits inside the commit (all the stuff you see with git cat-file -p). If you change anything, you get a new and different hash ID. Each hash ID is always unique.3

So, let's draw this out, but not bother with the internal arrows; let's just keep the one for the branch name itself.

A--B--C  <-- master

Each commit on its own, then, saves a snapshot for you. It's when you assemble them all together with their backwards arrows that you get the commit graph.


1Except, that is, a completely empty repository, which obviously has no commits. That's how you get root commits in the first place, by making a commit with no parent.

2Commits can, however, be garbage collected once you have no use for them. Git normally does this invisibly; we'll see how that comes about soon.

3Pay no attention to that web site behind the curtain! Seriously, though, the recent breakage of SHA-1 hashing is not an immediate problem for Git, but it does help push Git to switch to SHA-256.


Adding a new commit

Now that we see how the graph looks with three commits, let's add a new commit to master, to see how that works. First we will git checkout master as usual. That fills in the index and the work-tree. Then we'll work, git add stuff, and git commit.

(Reminder: the work-tree is, well, where you do your work. When Git saves files, it lists them under unpronounceable hash-ID names, and stores them compressed, and thus keeps them in a form useful only to Git itself. To use those files, you need them in their normal form, and that's the work-tree. Meanwhile the index is where you and Git build the next commit. You work on files in the work-tree, then you run git add to copy them from the work-tree, into the index. You can git add at any time: that just updates the index from the work-tree again. The index starts out matching the current commit, and then you modify it until you are ready to make a new commit.)

When you run git commit, Git collects up your log message, then:

  1. Writes out the index as a new tree: this is your saved snapshot, based on what you replaced in the index from the work-tree. The new tree gets its own hash ID.
  2. Writes a new commit object, with this new tree ID, the current commit's ID as its parent, you as the author and committer (and now as the time stamps), and your log message.

Step 2 gets Git a new hash ID for our new commit; let's call it D. Since the new commit has C's hash ID in it, D points back to C:

A--B--C     <-- master (HEAD)
       \
        D

The last thing Git does, though, is to write D's ID into the current branch name. If the current branch is master, this makes master point to D:

A--B--C
       \
        D   <-- master (HEAD)

If we git checkout -b some new branch first, though—just before we make the new commit, that is—then look at our new starting setup:

A--B--C     <-- branch (HEAD), master

Both names, branch and master, point to C, but HEAD says that we are on branch branch, not on master. So when we make D and Git updates the current branch, we get this:

A--B--C     <-- master
       \
        D   <-- branch (HEAD)

This is how branches grow. A branch name just points to the tip commit of a branch; it's the commits themselves that form the graph.

It's worth stopping at this point and thinking about commits A-B-C. They're on master, for sure. But they are also on branch branch. A commit, in Git, may be on many branches at the same time. What we need to do, quite often, is limit how far back we let Git go when we tell it: "Get me all the commits starting from this branch-tip and working backwards."

Now for the exciting part!

Well, maybe exciting. :-) You have made several branches with a bunch of new commits, so let's draw that:

...--E--F--G                <-- master
            \
             I--J--K--L     <-- feature1
                    \
                     P--Q   <-- feature2

Here, master ends at G, i.e., commit G is the tip of master. feature1 ends at L, and feature2 ends at Q. Commits E-F-G are on all three branches. Commits P-Q are only on feature2. Commits I-J-K are on both feature1 and feature2. Commit L is only on feature1.

Remember, again, these letters stand in for big ugly hash IDs, where the actual hash ID encodes everything in the commit: the saved tree and the parent IDs. So L requires K's hash ID, for instance. This kind of thing matters because we intend to copy some commits.

What you described wanting to do is to somehow transplant commits P and Q so that they sit atop master. What if there were a way to copy commits? It turns out that there is: it's called git cherry-pick.

Cherry-picking

Remember that we noted earlier that a commit is a snapshot. It's not a set of changes. But right now we wish that a commit were a set of changes, because commit P is a lot like its parent commit K, but with some changes made. After all, you made P by having K checked out, then editing files and git adding the new versions into the index and then git committing.

Fortunately, there's an easy4 way to turn a snapshot into a changeset, by comparing it (git diff) against its parent commit. The output from git diff is a minimal5 set of instructions: "Remove this line from this file, add these other lines to that file, etc." Applying those instructions to the tree in K will turn it into the tree in P.

But what happens if we apply those instructions to some other tree? As it turns out, this often "just works". We can git checkout commit G—the tip commit of branch master, but let's use a different branch name:

...--E--F--G                <-- master, temp (HEAD)
            \
             I--J--K--L     <-- feature1
                    \
                     P--Q   <-- feature2

and then apply the diff to the work-tree. We'll assume that goes well, automatically git add the result to the index, and git commit while copying the log message from commit P. We'll call the new commit P' to say "like P, but with a different hash ID" (because it has a different tree, and a different parent):

             P'             <-- temp (HEAD)
            /
...--E--F--G                <-- master
            \
             I--J--K--L     <-- feature1
                    \
                     P--Q   <-- feature2

Now let's repeat this with Q. We run git diff P Q to turn Q into changes, apply those changes to P', and commit the result as new Q':

             P'-Q'          <-- temp (HEAD)
            /
...--E--F--G                <-- master
            \
             I--J--K--L     <-- feature1
                    \
                     P--Q   <-- feature2

This is just the two git cherry-pick steps, plus of course creating the temporary branch. But look what happens now if we erase the old name feature2 and change temp to feature2:

             P'-Q'          <-- feature2 (HEAD)
            /
...--E--F--G                <-- master
            \
             I--J--K--L     <-- feature1
                    \
                     P--Q   [abandoned]

It now looks like we made feature2 by doing git checkout -b feature2 master, then writing P' and Q' from scratch! That's just what you wanted.


4Easy, that is, after any number of master's and/or PhD theses on string-to-string edit problems.

5"Minimal" in some sense, and somewhat tweak-able through different diff algorithms. Minimizing the edit distance is important for compression but not actually for correctness. However, when we go to apply the edit instructions to some other tree, the minimal-ness, and the exact instructions, really start to matter.


Git's rebase is automated cherry-pick plus the branch label moving

We can do all of the above at once using:

git checkout feature2
git rebase --onto master feature1

What we are doing here is using feature1 as a way to tell Git what to stop copying. Look back at the original graph, before the abandonment of the original commits. If we tell Git to start at feature1 and work backwards, that identifies commits L, K, J, I, G, F, and so on. Those are the commits we explicitly say not to copy: commits that are on branch feature1.

Meanwhile, the commits we do want to copy are those on feature2: Q, P, K, J, and so on. But we stop as soon as we hit any of the forbidden ones, so we'll copy only the P-Q commits.

The place we tell git rebase to copy to is—or is "just after"—the tip of master, i.e., copy commits so that they come after G.

Git rebase does it all for us, which is ridiculously easy. But there could be a hitch—or maybe several.

Solving hitches

Let's say that we start out with this as before:

...--E--F--G                <-- master (HEAD)
            \
             I--J--K--L     <-- feature1
                    \
                     P--Q   <-- feature2

We'd like to rebase feature2 onto master, skipping most of feature1, but it turns out we need what we changed in commit J too.

We don't need I, or K, or L, just J (plus of course P and Q).

We can't do this with just git rebase. We may need an explicit git cherry-pick, to copy J. But this is Git, so there are lots of ways to do this.

First, let's look at the explicit-cherry-pick method. We'll go ahead and make a new branch and cherry-pick J:

git checkout -b temp
git cherry-pick <hash-ID-of-J>

Now we have:

             J'             <-- temp (HEAD)
            /
...--E--F--G                <-- master
            \
             I--J--K--L     <-- feature1
                    \
                     P--Q   <-- feature2

Now we can transplant P-Q as before. We just change the --onto directive:

git checkout feature2
git rebase --onto temp feature1

The result is:

               P'-Q'        <-- feature2 (HEAD)
              /
             J'             <-- temp
            /
...--E--F--G                <-- master
            \
             I--J--K--L     <-- feature1
                    \
                     P--Q   [abandoned]

We don't need temp any more at all, so we can just git branch -d temp and straighten out our drawing:

             J'-P'-Q'       <-- feature2 (HEAD)
            /
...--E--F--G                <-- master
            \
             I--J--K--L     <-- feature1
                    \
                     P--Q   [abandoned]

Another way to get the same result

Suppose that, instead of copying just P-Q, we let git rebase copy I-J-K-P-Q. This might actually be easier:

git checkout feature2
git rebase master

This time we don't need --onto: master tells Git both which commits to leave out and where to put the copies. We leave out commit G and earlier, and we copy after G. The result is:

             I'-J'-K'-P'-Q'  <-- feature2
            /
...--E--F--G                 <-- master
            \
             I--J--K--L      <-- feature1
                    \
                     P--Q    [abandoned]

Now we have too many commits copied, but now we run:

git rebase -i master

which gives us a bunch of "pick" lines for each commit I', J', K', P', and Q'. We delete the ones for I' and K'. Git now copies again, giving:

             J''-P''-Q''    <-- feature2
            /
...--E--F--G                <-- master
            \
             I--J--K--L     <-- feature1
                    \
                     P--Q   [abandoned]

which is what we want (the original copies are still in there, abandoned like the original-original P-Q, but they were there for so little time, who cares anyway? :-) ). And of course, we can make that first git rebase use -i and remove the pick lines, and just have the J'-P'-Q' copies, all in one step.

Eliminating redundant commits

That's fine as far as it goes, but now there is both a J and a J'. Actually there's nothing wrong with this—you can leave this situation in place, and even merge with it like this, with no real harm done. But you might want to make J' part of master first and then share it.

Again, there is more than one way to do this. I want to illustrate one particular way, though, because git rebase has some magic in it.

Let's say that we have done the feature2 rebasing so that we have this now. We'll drop the abandoned commits entirely, just like Git does when it eventually gets around to garbage-collecting them (note: you get at least 30 days by default before this happens, giving you about a month to change your mind):

             J'-P'-Q'     <-- feature2
            /
...--E--F--G              <-- master
            \
             I--J--K--L   <-- feature1

You can now fast-forward master to include J':

git checkout master
git merge --ff-only <hash-id-of-J'>

This moves the labels, without changing the commit graph. To make it easy to draw in ASCII text, though, I'll move J' down one row:

                P'-Q'     <-- feature2
               /
...--E--F--G--J'          <-- master
            \
             I--J--K--L   <-- feature1

(We could also get here by explicitly git cherry-picking J into master originally, then rebasing feature2 without any fancy footwork.) So now we'd like to copy feature1's commits, adding them after J', and removing J.

We could do this with another git rebase -i, which lets us explicitly delete the original commit J. But we don't have to. Well, we don't have to, most of the time. Instead, we just run:

git checkout feature1
git rebase master

This tells Git that it should consider I-J-K-L as the candidates for the copy, and put the copies after J' (where master now points). But—here's the magic—git rebase looks closely at all6 of the commits that are on master that are not on feature1 (these are called the upstream commits, in at least a few bits of documentation). In this case, that's J' itself. For each such commit, Git diffs the commit against its parent (a la git cherry-pick) and turns the result into a patch ID. It does the same with each candidate commit. If one of the candidates (J) has the same patch ID as one of the upstream commits, Git eliminates the candidate from the list!

Hence, as long as both J and J' have the same patch ID, Git automatically drops J, so that the final result is:

                P'-Q'     <-- feature2
               /
...--E--F--G--J'          <-- master
               \
                I'-K'-L'  <-- feature1

which is just what we wanted.


6All, that is, except merges. Rebase literally can't copy merges—a new merge has a different parent-set than the original, and cherry picking "undoes" a merge in the first place—so by default it skips them entirely. Merges don't get a patch ID assigned, and don't get plucked out of the set because they were never in the set. It's usually a bad idea to rebase a graph-fragment that contains a merge.

Git does have a mode that tries to do it. This mode re-performs the merges (as it has to: I leave working out the details as an exercise). But there are a bunch of dangers here, so it's usually best not to do this at all. I have said before that probably git rebase should default to "preserving" merges, but by erroring-out if there are merges, requiring either a "yes, go ahead and try to re-create merges" flag, or a "flatten away and remove merges" flag to proceed.

It doesn't, though, so it's up to you to draw the graph and make sure your rebases make sense.


When rebase goes wrong: merge conflicts

Any time you git rebase some commits, you run the risk of merge conflicts. This is particularly true if you are plucking a segment of commits out of a long chain:

        o--...--B--1--2--3--4--...--o   <-- topic
       /
...o--*--o--o--o--T                     <-- develop

If we want to "move" (copy, then remove) commits 1-4 into develop, there's a good chance some or all parts of some of those four commits depend, in some way, on the other top-row commits that come before them (B and earlier). When that happens, we tend to get merge conflicts, sometimes many. Git winds up viewing the copy of commit 1 as a three-way merge operation, merging the changes from B to 1 with the changes from B to T. The changes "from" B to T may looks quite complex, and may not appear sensible out of context, because we have to "go backwards" through the commits before B down to * and then "forwards" up to T.

It is up to you to figure out how, or even whether it is wise, to do this.

When rebase goes wrong: others are still using the originals

Because rebase is fundamentally a copy operation, you must consider who might still have the original commits. Since commits can be on many branches, it may be the case that you have the originals. This is what we saw when we had both J and J', for instance.

Sometimes—even somewhat often—this may not be a big deal. Sometimes it is. If all the extra copies are only in your own repository, you can resolve all this on your own. But what happens if you have published (pushed, or let others fetch from you) some of your commits? In particular, what if some other repository has those original commits, with their original hash IDs? If you have published the original commits, you must tell everyone else who has them: "Hey, I'm abandoning the originals, I have shiny new copies elsewhere." You must get them to do the same thing, or else put up with the extra commit copies.

Extra commits are sometimes harmless. This is particularly true of merges, since git merge works hard to take only one copy of any given change (although Git cannot always get this quite right on its own, since each change—each git diff output—depends on context and other changes, and the minimal-edit-distance algorithms sometimes go wrong themselves, picking the wrong "minimum changes"). Even if they do not break the tree, though, they do clutter up the commit history. Whether and when this might be a problem is hard to predict.

Summary

For your goals, git rebase is a powerful tool. It needs a bit of care when using it, and the most important thing to remember is that it copies commits, then abandons—or tries to abandon—the originals. This can go wrong in several ways, but the worst ones tend to occur when other people already have copies of your original commits, which generally means "when you have published (pushed) them".

Drawing graphs can help. Everyone should make a habit of drawing their graphs, and/or using git log --graph (get help from "a dog": git log --all --decorate --oneline --graph, All Decorate Oneline Graph) and/or graphical browsers like gitk (although I personally hate GUIs in general :-) ). Unfortunately, "real" graphs rapidly get very messy. Git's built-in log --graph does a poor job separating rats-nest graphs. There are a lot of ad-hoc tools to deal with this, some built in to Git, but it definitely helps to have a lot of practice reading the graphs.

like image 74
torek Avatar answered Sep 28 '22 04:09

torek


If your history looks like:

...--E--F--G                <-- master
            \
             I--J--K--L     <-- feature1
                    \
                     P--Q   <-- feature2

In the simple case to remove feature1 then do:

git checkout feature2
git rebase --onto master feature1

This is an abbreviation of @torek's answer which is phenominal, but hard to find the actual answer to the question. Read @torek's answer for more details and what to do in the non-simple cases.

like image 42
nanotek Avatar answered Sep 28 '22 05:09

nanotek