Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Git log first parent and follow

Tags:

git

I would like get the list of merge commits related to a file + follow the file when it has been renamed. I can't find a way to achieve this despite the --first-parent and --follow flags. They can't seem to work as expected when used together.

To give you an example, let's say I have a file named foo.txt.

  1. On the master branch, I append "hello" to this file and commit with the message commit: hello
  2. I create a new branch named branch-world, append "world" to foo.txt and commit with the message commit: world
  3. I move to master and run git merge --no-ff branch-world
  4. I create a new banch named branch-rename, run git mv foo.txt bar.txt, commit with the message commit: rename
  5. I move to master and run git merge --no-ff branch-rename
  6. I create a new dummy file dummy.txt without content and commit this file with the message commit: dummy

Given these steps:

  • git log --oneline gives me too much information since I only want what's about foo.txt:

    cf0c1e4 (HEAD -> master) commit: dummy
    ca857ce Merge branch 'branch-rename'
    45ab4bc (branch-rename) commit: rename
    2057b9c Merge branch 'branch-world'
    46605f3 (branch-world) commit: world
    c52a91c commit: hello
    
  • git log --oneline -- bar.txt does not provide any information about foo.txt:

    45ab4bc (branch-rename) commit: rename
    
  • git log --oneline --follow -- bar.txt provides the children commits, rename included but doesn't show the merge commits:

    45ab4bc (branch-rename) commit: rename
    46605f3 (branch-world) commit: world
    c52a91c commit: hello
    
  • git log --oneline --first-parent -- bar.txt provides the merge commits but does not retrieve the commit related to foo.txt:

    ca857ce Merge branch 'branch-rename'
    
  • git log --oneline --follow --first-parent -- bar.txt does not return anything.

Any idea?

like image 772
Simon Avatar asked May 31 '26 18:05

Simon


1 Answers

As you noted in comments, you need:

git log -m --oneline --follow --first-parent -- bar.txt

I would argue that this is a bug. The fact that -m gets around the problem tells us that when using --first-parent, Git should probably do an implied -m, the same way that --follow implies -M (find renames).


Let's start with how git log shows commits by default. It begins with a priority queue into which it puts all the commits you specified on the command line:

git log br1 br2 br3

means to look at the tip commits of branches br1, br2, and br3, so those three commits (by hash ID) go into the priority queue. If you don't specify a starting commit, git log uses HEAD.

It then plucks the next (front of queue) commit from the queue, which is the one with the highest priority. If there's only one commit in the queue—e.g., when run with HEAD—that's the one commit and the queue is now empty. The default priority is by committer-date time stamp, so that the highest-valued date—the one furthest into the future—wins this race. If there are no future-dated commits in the queue, the one that's the least-into-the-past wins instead. If there's just one commit—the usual case—that one commit wins the priority race.

Git now shows this commit by printing its hash ID and log message, or other details according to your --pretty= or --format= directive. Note that --oneline is just shorthand for --pretty=oneline --abbrev-commit.

Then, as long as the commit is not a merge, Git runs a git diff <parent> <commit> to show you the differences here. Any diff options you add to the git log line, such as --name-status, affect this diff output. But by default, if the commit is a merge, git log just proceeds to its last step.

Now that git log has shown the commit, it places all the commit's parents into the priority queue. If the commit is ordinary (has one parent), the queue length is now the same as it was before Git showed this one commit. If it's a merge commit, this puts two or more parents in the queue; if it's a root commit, it does nothing.

So, again, the overall driver of the sequence is this:

  • Initialize priority queue.
  • Add all command line selected commits, or HEAD if none.
  • Loop until queue is empty:
    1. Take head of queue.
    2. Show commit hash and log message (or whatever is selected by format).
    3. If not merge, show diff.
    4. Place parents into queue.

We can change step 3 of this loop by adding any of the options -c, --cc, or -m. However, we can also change the overall loop—steps 2 and 4 in particular—using a path name like bar.txt, or options like --first-parent, or options like --since and --until.

Really, we should restructure this loop to read:

  1. Take head of queue.
  2. Decide whether to show it at all. If selected for showing:

    • Show commit hash and log message (or whatever is selected by format).
    • If not a merge, or if forced by -c or --cc or -m, show diff.
  3. Place selected parents into queue.

Numerous git log options select particular commits, and this includes --since, --until, --author, --grep, and so on This is true of -- bar.txt as well: that tells Git to select commits that modify the named file.

When using path names, though, git log turns on History Simplification, which—at least by default—affects the selection in step 3 as well. In particular, when selecting parents to put into the queue, Git does a very clever trick: it places into the queue a parent commit that didn't modify the file(s) you listed on the command line. In other words, it completely prunes a side branch that did change the file!

If you are trying to figure out why something didn't change a file across a merge,1 this is not what you want at all. But if you're trying to figure out what did create a particular version of the file, this is what you want, because the merge has ignored the change from the branch it's not following. You're trying to figure out why bar.txt has some particular text in it, and the branch didn't put that text in it in the end, so the branch must be uninteresting!

That's not what's going on in your example, but it is worth noting. One can add --full-history to avoid history simplification, or various other flags to change the way the history simplification happens, but this is what all that verbiage about "TREESAME" means, in the documentation.


1In particular, if you're looking for a change you thought went in but doesn't seem to be in the current file, History Simplification just gets in your way, and you should use --full-history.


Combined diffs, -m, and --first-parent

It's now time to talk about combined diffs, which ties into the notion of "TREESAME". Remember that the definition of a merge commit is any commit with two or more parents (usually just two). Remember that git diff normally compares just two commits, and for ordinary commits, git show and git log compare the parent—the one-count-it-1 commit that is the parent—to the child. For a merge commit, though, there are at least two parents. Which one should we compare?

Git's answer to this dilemma is to use combined diffs, where Git compares all the parents to the child. To make this easier, Git does a first pass to eliminate all the files where the child (merge) commit version of the file matches any of the parent versions. The theory here is that since the merge's version of bar.txt matches at least one parent version, you don't need to see the change now. Git will probably follow that parent, either because it's looking at all parents, or you're using history simplification and this is a file we care about so we're going to follow a TREESAME parent.

Hence, we will only see bar.txt in a combined diff if the merge's version of bar.txt is different from every parent's version of bar.txt. In that case, Git will show the changes using the combined diff format.

Git does not check for --first-parent here. The combined diff code works the same with or without --first-parent. This is the part that seems to me to be a bug.

Using --first-parent alters step 3 of our restructured loop: when adding parents to the priority queue, Git adds only the first parent of each merge. Since it's only going to follow the first parent, it seems like this should disable the combined diff code entirely, a la the -m argument.

The -m argument tells Git to split each merge into multiple virtual children for git diff purposes. Instead of making one big diff against all parents, it pretends that the (single) merge commit is actually multiple ordinary commits, each with one parent, but sharing the same source tree. That way each git diff has just two commits: one parent, one child.

Combining -m with --first-parent, git log will inspect the first parent against the merge commit, which is just what we want here.

A side note on --follow

What --follow does is kind of a hack. You are only allowed to give one path name, such as bar.txt, to --follow. This then enables rename-finding, as if you had specified -M or --find-renames, or set diff.renames to true in your Git configuration, or by default if you are using Git version 2.9 or newer.

When Git is doing its rename finding and --following, after deciding whether to show the commit (step 2), Git will change the (single) name it's looking for, if the specific target file was renamed. In my case, after reproducing your setup, I can run this:

$ git log -m --oneline --follow --first-parent --name-status -- bar.txt
f2f5743 Merge branch 'branch-rename'
R100    foo.txt bar.txt
4cd490a Merge branch 'branch-world'
M       foo.txt
4afa129 commit: hello
A       foo.txt

The R100 above is the result of rename detection. Git knows that from this point onward—i.e., in commits anywhere earlier in history—the file bar.txt is now known instead as foo.txt ... so now, instead of looking for bar.txt, Git starts looking for foo.txt.

If you use --full-history to make Git follow both "sides" of the merge (both parents), eliminating the --first-parent option, we see that there is a kind of flaw here:

$ git log -m --oneline --follow --name-status -- bar.txt            
f2f5743 (from 4cd490a) Merge branch 'branch-rename'
R100    foo.txt bar.txt
f31ad99 (branch-rename) commit: rename
D       foo.txt
4cd490a (from 4afa129) Merge branch 'branch-world'
M       foo.txt
9b4999d (branch-world) commit: world
M       foo.txt
4afa129 commit: hello
A       foo.txt

Now, in branch-rename, we didn't actually delete foo.txt, we just don't have a foo.txt at all. But the rename detection mechanism is being abused a bit by the --follow code, so that when Git goes to do the diff of commit f31ad99 against its parent 4cd490a, it doesn't notice that there was a rename here: it just sees that the file isn't there in the parent (and was in the child).

Fundamentally, the problem here is that --follow applies at the time Git shows the merge commit: it switch from the new name to the old name immediately. When traversing any leg of the merge that happens to use the new name instead of the old name, it won't see the file there. Only when traversing a leg of the merge that uses the old name will Git see changes to the file.

like image 187
torek Avatar answered Jun 02 '26 08:06

torek



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!