Merge master adds all changes to my branch

Question

I am working on a branch, I changed 5 files or so. While doing that, others have pushed changes in >100 files to master. Now while working on my branch I want to merge master in my local branch every now and then. I would do it like this:

git checkout master
git pull
git checkout my-branch
git merge master
git push

But now, for some reason, all the files that have been changed by other people on master are added to my changes. So if I would actually push after merge master, it would show that I changed >100 files instead of just 5. What I am I doing wrong? Thanks.

torek · Accepted Answer

There isn't really any problem here: you are just misinterpreting what Git says. (The fact that Git can be misinterpreted could be considered a problem, I suppose, but in practice, whether it's Git or anything other version control system, this stuff is hard and it requires learning and experience.)

There are some key things to know about Git, files, and commits:

What Git stores, at the level you interact with it, is commits. Branch names like master are useful, but they really just help Git—and you—find commits. We'll see how this works in a moment.
Commits do store files, but you will usually work with a whole commit at a time. You tell Git: get me commit X, for some X that identifies a commit, and you get all the files for that commit. You either have the commit—and hence all the files—or you don't have the commit at all, and hence you have none of the files.
Each commit has a unique ID. This ID is its hash ID and it is a big ugly string of random-looking letters and digits, such as 9fadedd637b312089337d73c3ed8447e9f0aa775. That hash ID, once it exists, means that commit, and never any other commit.
The contents of any one commit are completely, totally, 100% read-only. Neither the files stored inside a commit, nor any of the commit's metadata, can ever be changed. (The reason for this is that the hash ID is a cryptographic checksum of the contents of the commit. If you take a commit out, modify any of its bits at all, and put that back, you get a new, different commit, with a new, different hash ID. The old commit is still in there: you've just added one more commit.)
Each commit's snapshot-of-all-files is just that: a snapshot. That is, commits don't store changes at all.
But when you look at a commit, Git often shows you changes. This is a trick! But it is also a good thing, because that's usually more interesting anyway.
The reason Git can show a commit as changes is because most commits store the raw hash ID of a single previous or parent commit. So given any one commit X, Git can back up one step to find the commit that comes before X. That commit has a snapshot too.

Git can—and does—just extract the two snapshots, the parent and the child, and compare them. For each file that is the same, Git says nothing at all. For each file that is different, Git shows you a recipe: Start with the parent's copy of the file. Add this line here. Delete that one there. Repeat as needed, and when you're done adding and deleting, you'll have the version of the file that's in the child commit.

When you have a simple line of commits, all in a row, you can draw them, or think of them, like this:

... <-F <-G <-H ...

where H stands in for some hash ID that finds a commit. Commit H itself contains the hash ID of its parent, which we'll just call G. That lets Git find G. G contains the hash ID of its parent, F, which lets Git find F, and so on.

A branch name like master simply holds the hash ID of the last commit in the chain. The last commit points backwards to its parent, which points backwards again, and so on. So we can draw this as:

...--F--G--H   <-- master

We don't really need to draw the connecting arrows from one commit to the next as arrows since they can't change. No part of any commit can ever change. So they'll always point backwards. The arrow coming out of a branch name, however, does change. We may start with:

...--G--H   <-- master

and then add a new branch name so that we can make new commits without touching our master yet:

...--G--H   <-- master, dev

but eventually we'll add a new commit to our branch. Let's add the special name HEAD to dev to remember that this is the name we're using—the name we used when we ran git checkout dev—and draw it like this:

...--G--H   <-- master, dev (HEAD)

Now we'll make a new commit. It will get some big ugly random-looking hash ID, but we'll just call it I, and draw it in like this:

          I
         /
...--G--H   <-- master, dev (HEAD)

I points back to H, because H is the current commit when we make I.

Now comes the clever trick: Git writes I's hash ID into a branch name. The branch name that gets changed is the one HEAD is attached to: dev. So now dev points to I instead of H:

          I   <-- dev (HEAD)
         /
...--G--H   <-- master

No existing commit has changed. (None can, after all.) But our new commit I now exists, and points back to existing commit H, and now our name dev points to commit I, which is now the current commit.

When we make new commit J, Git does the same thing, giving us:

          I--J   <-- dev (HEAD)
         /
...--G--H   <-- master

At this point, though, we might run git checkout master and git pull (or git fetch && git merge) and acquire some new commits that someone else made. Just for symmetry I'll draw in two commits that the someone-else made. This advances our master up over their two new commits, too:

          I--J   <-- dev
         /
...--G--H
         \
          K--L   <-- master (HEAD)

The current branch is now master and the current commit is now L. You might wonder why I drew them on a separate line: it's mostly to emphasize that commits up through H are on both branches. This strange fact—that commits can be on more than one branch at a time—is somewhat peculiar to Git.

We can now run git checkout dev to prepare to merge master into dev. This first step just moves HEAD over to dev:

          I--J   <-- dev (HEAD)
         /
...--G--H
         \
          K--L   <-- master

We can now merge the two branches. We're really merging commits, because Git is all about commits, but let's see how this works.

In our commits I-J, we made some changes to some files. In their commits K-L, they—whoever they are—made some changes to some files. We are about to make a new merge commit and this merge commit will hold a snapshot, just like every commit. What should go into this snapshot?

The answer is: we'd like this snapshot to combine our work with their work. That is, we'd like to start with every file from a shared, common commit. The best shared common starting-point is clear from the diagram: it's commit H. That commit is on both branches. So is G, but H is better because it's the closest thing to J and L.

So, Git will start with whatever is in H. It will compare H vs J, to see what we changed. Each file that we changed has a recipe: add some lines, delete some lines. Then, Git will start again with whatever is in H, and compare H vs L, to see what they changed. Each file that they changed has a recipe: add some lines, delete some lines.

Git now combines these change-recipes. Wherever we changed a file and they didn't, the result is our file. Wherever they changed a file and we didn't, the result is their file. If we both changed one particular file, Git combines our changes. This is the hard part of the merge: combining changes.

If the lines we changed are different from the lines they changed (and the recipes don't have adjacent or abutting lines either), Git will be able to combine these changes on its own. Or, if we and they make the exact same change to some line(s)—e.g., if we both fixed the same spelling error somewhere—Git will just take one copy of the change. Otherwise—if we changed a line in a different way than they did—Git will produce a merge conflict error for that file, and leave us with a mess to clean up.

Having merged all files, to the best of its ability, Git now either stops with merge conflicts, or did not have any merge conflicts and goes on to make a merge commit. Let's assume there were no conflicts, to make things easy.

The only thing that is special about this merge commit is that, instead of one parent, it has two. We can draw it like this:

          I--J
         /    \
...--G--H      M   <-- dev (HEAD)
         \    /
          K--L   <-- master

The first parent of new commit M is commit J, which advances the branch dev one step as usual. The second parent of new commit M is commit L, which is still the tip commit of branch master. Nothing happens to the name master and no existing commit has changed (since none can), but new merge commit M makes it so that commits K and L are now on branch dev too, along with commits up through J.

Why merges work

If we now ask Git: where did some particular line (line 42, say) of some particular file F come from, Git can look at the snapshot in M, then look at both the snapshots in J and L. If line 42 of F matches in M and J but is different in M and L, then line 42 "came from" J: the merge kept the line from J. Git will now step back one more commit, to I, to see if line 42 in F matches in I and J. If they're different there, Git will say that line 42 came from the person who made commit I, on the date that they made commit I.

If line 42 of F matches in M and L, though, and is different in J, that means the merge kept line 42 from L. So Git should step back to L, and then K, and so on as needed.

If line 42 matches in M, L, and J, it probably came through unchanged from H and Git will keep marching back, one commit at a time, to see if it changed in the G-to-H transition, or if it came from an even-earlier change.

The command that looks at particular lines of one particular file is git blame (or git annotate). Note that, like so many Git commands, it must work through commits, one step at a time, marching backwards through time. These commits, one at a time, are the history in the repository. History is commits; commits are history.

You must not take out someone else's changes (unless they are wrong)

The result of any merge is automatically the correct file. A future merge will assume that whatever you put in, is right. If you take out their changes, this means you are saying that their code was bad and should be forgotten.

If that's actually the case, it is OK to remove this code—but you probably should do that in another, separate commit, rather than doing it directly in the merge.

Side notes about fast-forward merges

Although we didn't cover it properly here, Chuck Lu's answer mentions fast-forward merges. Suppose we draw a series of commits like this:

...--C--D--E   <-- branch1 (HEAD)
            \
             F--G--H   <-- branch2

indicating that we have branch branch1, and hence commit E, checked out right now. If we run git merge branch2, Git will find that the best common commit, on both branches, is the current commit E. In this case, Git does not have to do a real merge. Given the option, Git will do a fast-forward operation instead, by, in essence, doing a git checkout of commit H, but dragging the branch name branch1 forward in the process:

...--C--D--E
            \
             F--G--H   <-- branch1 (HEAD), branch2

(There is now no reason to keep the diagonal line in the drawing; feel free to take it out when you draw this yourself.)

When Git does this operation, it also does a comparison of the snapshot in the old commit E with the newly-current commit H. For each file that changed, it tells you something about that change.

You can see the same comparison by running:

git diff --stat <hash-of-E> HEAD

Since HEAD now names commit H, this git diff compares the snapshot in E to the snapshot in H—exactly the same thing git pull did—and therefore prints the same information again.

When you do a real merge (as we did with M), the information you see right at that time is based on the comparison of your previous commit (J) and that in M. Since M combines the changes from both "sides" of the branch, but J has your changes, what you see is their changes. You can, however, run git diff --stat master dev to compare commit L vs commit M: this time, you will see what the merge brought from "your side" of the branch.

It's hard to see what's in a real merge M in general because of its two parents. You need two separate git diff commands to see it properly, really. The git show command can do this automatically if you give it the -m flag, but we won't cover that here.

Chuck Lu · Answer

There are two kinds of git merge, fast-forward and no-fast-forward.

It seems that you encountered the no-fast-forward type, which will generate a new merge commit.

If you do not want to generate a merge commit, you could try with git rebase.

git checkout master  
git pull  
git rebase master my-branch （might encounter conflicts here）  
git push

You can find animation demo about rebase here

Merge master adds all changes to my branch

Tags:

git

github

suuuriam

2 Answers

Why merges work

You must not take out someone else's changes (unless they are wrong)

Side notes about fast-forward merges

torek

Chuck Lu

Recent Activity

Donate For Us

Merge master adds all changes to my branch

Tags:

git

github

suuuriam

2 Answers

Why merges work

You must not take out someone else's changes (unless they are wrong)

Side notes about fast-forward merges

torek

Chuck Lu

Related questions

Recent Activity

Donate For Us