Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

git merge updates files not changed on source branch

Tags:

git

git-merge

I'm struggling to understand how the following behavior is a good thing in git. See below for an example I put together to help illustrate my problem. Many times my team and myself are getting changes/commits going into branches that we did not want to go there.

> git init sandbox && cd sandbox
> echo "data a" > a.txt && echo "data b" > b.txt
> git add -A && git commit -a -m "initial population"
[master (root-commit) d7eb6af] initial population
 2 files changed, 2 insertions(+)
 create mode 100644 a.txt
 create mode 100644 b.txt
> git branch branch1
> echo "more data a" >> a.txt && git commit -a -m "changed a.txt on master"
[master 11eb82a] changed a.txt on master
 1 file changed, 1 insertion(+)
> git branch branch2 && git checkout branch2
Switched to branch 'branch2'
> echo "more data b" >> b.txt && git commit -a -m "changed b.txt on branch2"
[branch2 25b38db] changed b.txt on branch2
 1 file changed, 1 insertion(+)
> git checkout branch1
Switched to branch 'branch1'
> git merge branch2
Updating d7eb6af..25b38db
Fast-forward
 a.txt | 1 +
 b.txt | 1 +
 2 files changed, 2 insertions(+)

Notice in the above, a.txt is updated in the merge, even though it was not touched/modified on branch2. In the above scenario I would expect git to be intelligent to recognize that a.txt was not changed on branch2 and therefore when applying updates to branch1, not make those changes.

Is there something I'm doing wrong? Yes, I could cherry pick and that would would for this simplistic example where I know what I changed, but is not realistic under real circumstances where the changes are much larger and you don't know what might have been affected.

To be clear, I do not want this behavior from git.

like image 938
djschny Avatar asked Jan 22 '13 02:01

djschny


2 Answers

'branch1' and 'branch2' are nothing but commit pointers. They are states of the commit history at certain moments in time. As such, when merging 'branch2' into 'branch1', git does little more than establish a common ancestor and attempt to apply changes from both trees, together.

Take a simple diagram:

 branch1       E <- branch2
    |         /
    v        /
A - B - C - D <- master

In the example above, 'branch1' points at commit B and 'branch2' points at commit E. This describes, more or less, the order of operations you entered above. Were you to merge 'branch2' into 'branch1', git would find a common ancestor in B then apply all the history that exists between B and E to 'branch1', specifically commits C, D, and E.

What you want, however, is just E. One (bad) solution would be cherry-picking, as you've already identified. A much better solution is rebasing 'branch2' onto 'branch1', thereby rewriting 'branch2's history to include only commit E past 'branch1':

git rebase --onto branch1 master branch2

That results in exactly what you seek, and reads as 'rebase branch2, which was originally based on master, onto branch1'. Note, I've left the 'branch1' pointer out of this diagram for simplicity, and E became E' because its commit hash changed (as is a common convention with these diagrams):

       E' <- branch2
      /
     /
A - B - C - D <- master

You could get a similar effect with git checkout branch2 && git rebase -i B, then remove commits C and D from the interactive rebase session.

At my last job we routinely faced this problem with isolated feature branches. Cut at different moments in time from the same production branch, they would pull along unwanted changes if merged without rebasing. As an integration manager, I routinely rewrote their histories to a common point in the past (the last production release), thereby allowing clean merges all the way through. It's one of many possible workflows. The best answer depends heavily on how your team moves code around. In a CI environment, for example, it's sometimes less important that C and D get pulled along with merges like the one you describe.

Finally, note that if E depends on any code in C or D, this solution will wreak havoc on your history when merging 'branch1' (now containing the E' change set) back into 'master'. If your workflow is incremental, and 'branch1' and 'branch2' meddle in similar functions and files, merge conflicts will arise as a matter of course. In that case, a closer look at your team's workflow is probably warranted.

like image 96
Christopher Avatar answered Nov 15 '22 10:11

Christopher


If the commands above are completely and correctly typed, then git is correct. Here's what you did:

  1. created a repo (defaults to branch "master")
  2. added one changeset (2 new files) to "master"
  3. created a branch ("branch1") but did not change to it
  4. added one changeset (changed a.txt) on "master"
  5. created a branch ("branch2") and changed to it (this branch includes a.txt modified)
  6. added one chageset (changed b.txt) on "branch2"
  7. switched to "branch1" (contains two original, unchanged files)
  8. merged (fast-forward) with "branch2" (applied both changes: a.txt and b.txt)

This is exactly what you describe and exactly what should happen. Where you probably went wrong was thinking you were changing a.txt on "branch1" when you really changed it on "master" before you created "branch2" thus giving the illusion that changes magically appeared on "branch1" from "master" when merging with "branch2" but in reality the change came from "branch2".

If you repeat your test but at step 3 switch to "branch1" (git checkout -b branch1) instead of committing changes to a.txt to "master" I think you'll get the merge you expect.

like image 32
BrionS Avatar answered Nov 15 '22 08:11

BrionS