Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Git history for branch after merge

I am a bit confused on how git stores history after merging.

I have merged branch A to branch B successfully. Now, when I go to a file, in branch B, that was part of the merge I see all the history for that file for branch A but I don't see any history for branch B. Where has my history for that file for branch B gone to?

The way I merged was through git merge <branch> so in this case, I was in branch B and used git merge A.


For example, in branch A I had the following commits: a, aa, aaa corresponding to different files.

In branch B, I had the following commits: b, bb, bbb corresponding to different files.

Now when I merged branch A into branch B, all I see in branch B git log are a, aa, aaa history. I don't see my b history.

In essence, I want my merge to be linear, when I merge A to B then I want the history to have all of branch B history and on top of the history it will be the merge that just occurred similar to how SVN does it.

My current git log history is very confusing.

like image 369
Robin Avatar asked Dec 13 '22 18:12

Robin


1 Answers

TL;DR

The history isn't gone, Git just isn't showing it.

In Git, the history is the set of commits. There is no file history!

When you run a command like git log dir/sub/file.ext, or for that matter, git log dir/sub or git log . while in dir/sub, Git will synthesize a (temporary) file history, by extracting some sub-history from the real history—the set of commits. This synthetic process deliberately drops some commits. For instance, it drops all commits that don't affect any of the files you have asked about. But by default, it drops a lot more than that, via something that git log calls History Simplification.

Longer

Every commit has a unique hash ID. You see these in git log output, for instance. The hash ID is actually just a cryptographic checksum of the commit's content.

Each commit stores (the hash ID of) a snapshot of files—Git calls this a tree. This is true of merge commits as well: a merge commit, like any other commit, has a tree.

Each commit also stores your name (author and committer) and email address and time-stamp, so that Git can show these to you. It stores a log message—whatever you give it—so that Git can show that as well.

The last thing that Git stores in a commit—the second thing, really, right after the tree—is a list of parent commits, by their unique hash IDs.

Linear history is easy

When dealing with ordinary, non-merge commits, it's pretty straightforward to look at the history. We simply start with the latest commit, as identified by some branch name like master, and work backwards. The branch name contains the hash ID of the last commit—the tip of the branch—and we say that the branch name points to that commit:

... <--1234567...   <--master

If commit 1234567 is the tip of master, git log can show you commit 1234567 ... and commit 1234567 has inside it the hash ID of the commit that comes right before 1234567.

If we swap out real hash IDs for single letters, to make things easier, we get something like this:

A <-B <-C <-D <-E <-F <-G   <--master

Commit G points back to commit F, which points back to E, and so on until we reach the very first commit, commit A. This commit does not point anywhere—it can't, it was the first commit; it cannot have a parent—so this is where the history ends (starts?), at the beginning of time. Git calls A a root commit: a commit with no parent.

It's easy to show linear history, starting at the end of time and ending at the start. Git just picks out each commit one at a time and shows it. That's what:

git log master

does: it starts with the one commit identified by master, and shows it, and then shows the one commit's one parent, and then shows the one before that, and so on.

When you have Git show you a commit, you can—in fact, you almost always—have Git show it as a patch, rather than as a snapshot. For instance, git log --patch does this. To show a commit as a patch, Git just looks at the commit's parent's tree first, then at the commit's tree, and compares the two. Since both are snapshots, whatever changed from the parent's snapshot to the child's, must be whatever the person who made the child commit actually did.

Non-linear history is harder

Now that we know that Git works backwards, let's take a look at more complex history, including history that includes an actual merge commit. (Let's not get sidetracked by the fact that git merge does not always merge!)

A merge commit is simply a commit with at least two parents. In most cases you won't see commits with three or more parents—Git calls these octopus merges, and they don't do anything you cannot do with ordinary merges, so octopus merges are mainly for showing off your Git-fu. :-)

We normally get a merge by doing git checkout somebranch; git merge otherbranch, and we can draw the resulting commit chain like this:

...--E--F--G------M   <-- master
         \       /
          H--I--J   <-- feature

Now, suppose you run git log master (note: no --patch option). Git should of course show you commit M first. But which commit will Git show next? J, or G? If it shows one of those, which one should it show after that?

Git has a general answer to this problem: when it shows you a merge commit, it can add both parents of the commit to a queue of "commits yet to be shown". When it shows you an ordinary non-merge commit, it adds the (single) parent to the same queue. It can then loop through the queue, showing you commits one at a time, adding their parents to the queue.

When the history is linear, the queue has one commit in it at a time: the one commit gets removed and shown, and the queue now has the one parent in it and you see the parent.

When the history has a merge, the queue starts with one commit, Git pops the commit off the queue and shows it, and puts both parents in the queue. Then Git picks one of the two parents and shows you G or J, and puts F or I into the queue. The queue still has two commits in it. Git pops one off and shows that commit and puts another one on.

Eventually Git tries to put F on the queue when F is already on the queue. Git avoids adding it twice, so eventually the queue depth reduces to one commit again, in this case showing F, E, D, and so on. (The details here are a bit complicated: the queue is specifically a priority queue with the priority being determined by additional git log sorting parameters, so there are different ways that this can happen.)

You can view connections with git log --graph

If you add --graph to your git log command, Git will draw a somewhat crude ASCII-art graph with lines connecting child commits back to their parents. This is very helpful in telling you that the commit history you are viewing is not linear after all, even though git log is showing you one commit at a time (because it must).

Showing merge commits

I mentioned above that with -p or --patch, git log will show what changed in a commit by comparing the parent's snapshot/tree against the child's snapshot/tree. But for a merge commit, there are two (or even more) parents: there's no way to show you the comparison of the parent vs the child, because there are at least two parents.

What git log does, by default, is to give up entirely. It simply doesn't show a patch. Other commands do something more complicated, and you can convince git log to do that too, but let's just note that the default is for git log to give up here.

History Simplification (this is a clickable link to git log documentation)

When you run git log file.ext, Git will deliberately skip any non-merge commit where the diff (as obtained by comparing parent to child) does not touch file.ext. That's natural enough: if you have a chain like:

A--B--C--D--E   <-- master

and you changed (or first created) file.ext when you made commits A and E, you'd like to see just those two commits. Git can do this by figuring out a patch for D-vs-E and seeing that file.ext changed (so it should show E), then moving on to D. The C-vs-D comparison shows no change to file.ext, so Git won't show D, but it will put C in the priority queue and go on to visit C. That, too, has no change to the file, so Git eventually moves on to B, which has no change, and Git moves to A. For comparison purposes, all files in A are always new—that's the rule for any root commit; all files are added—so Git shows you A as well.

We just saw, though, that by default git log doesn't like to compute patches for a merge. It's too hard! So git log generally won't show you the merge here. It does, however, try to simplify away any part of the commit graph. As the documentation puts it, the default mode:

prunes some side branches if the end result is the same ...

If the commit was a merge, and [the file is the same as in] one parent, follow only that parent. ... Otherwise, follow all parents.

So at a merge commit like M in our graph, Git will do a fast check: is file.ext the same in M as in G? If so, add G to the queue. If not, is it the same in M as in J? If so, add J to the queue. Otherwise—i.e., file.ext is different in M than in both G and J—add both G and J to the queue.

There are other modes for History Simplification, which you can select with various flags. This answer is already too long so I will leave them to the documentation (see the above link).

Conclusion

You cannot draw too many inferences from what git log -- path shows you, because of the history simplification that Git performs. If you want to see everything, consider running git log --full-history -m -p -- path instead. The -m option splits each merge for git diff purposes (this goes with the -p option), and the --full-history forces Git to follow all parents at all times.

like image 104
torek Avatar answered Dec 17 '22 23:12

torek