Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Git log for a directory including merges

Tags:

git

I'm writing a scripts that takes a path as a parameter and outputs Git commits for that path, similarly to what GitHub does when you click the History button in some folder (here's an example). In essence, I wanted to write the script like this:

git log -10 --oneline -- "$directory"

However, I have a lot of trouble making it work reliably for both the root folder and subdirectories, and for various configurations of the follow behavior. After the experiments below, I also think that I'm misunderstanding how git log works with various flags in the first place so it would be great if someone could help me understand that.

Note: in our case, we need to pass multiple directories sometimes, which is why I'm trying to make pathspec work, as opposed to something like cd subdirectory && git log.

The examples are from the versionpress/versionpress repository (in the 2071052a state) and I have tested it in both Git for Windows 2.17.1.windows.2 and on Linux with Git 2.14.1.

First, a basic log for the docs folder:

$ git log -10 --oneline -- docs
81554a46 Small updates of Dev-Setup.md
84d3229e Updated intro message to mention beta instead of alpha
8244d42a Added link to announcement blog post
da97c7c3 Merge branch 'master' into 1263-pre-4.0beta-polish
5c99bc30 Merge pull request #1270 from versionpress/4.0-beta-release-notes
f22b0e27 4.0-beta release notes updated, are now close to final
f0de171f Specific WordPress version used in dev-env Dockerfile, Dev-Setup slightly updated
bcd6d4e8 package-lock JSONs updated for npm 5.1 (there were some issues reported with 5.0)
1438bc43 Documented running tests by picking a test suite from phpunit.xml, the "other tests" documented in more detail
5af500d4 Better instructions on running specific tests from CLI, tests' docker-compose.yml cleaned up, various other testing "readme" updates

Good, this is same as on GitHub.

Now, the same for the root directory:

$ git log -10 --oneline -- .
94084136 Typo in the activation message
8a15a8b1 Added error when re-activating VP with WP CLI
6ad07b7d package-lock.json updated
e7e356f8 Updated WP and WP-CLI versions in ext-libs
777e2d05 Disable event propagation after click on the commit table checkbox
81554a46 Small updates of Dev-Setup.md
32d6dbe2 Installed WordPress version bumpted to 4.9
459c2e6d package-lock.json updated for npm 5.5
77ad593c Link to Gitter and support repo in ISSUE_TEMPLATE.md
84d3229e Updated intro message to mention beta instead of alpha

Not good, merge commits are missing. I can remove the path spec to get them:

$ git log -10 --oneline
2071052a (HEAD -> master, temp/master, origin/master, origin/HEAD) Merge pull request #1318 from aidik/master
94084136 Typo in the activation message
129bf972 Merge pull request #1314 from x1024/master
8a15a8b1 Added error when re-activating VP with WP CLI
288e305d Merge pull request #1310 from versionpress/1307-wp-update-fixes
6ad07b7d package-lock.json updated
e7e356f8 Updated WP and WP-CLI versions in ext-libs
f8d22592 Merge pull request #1309 from versionpress/1308-fix-commits-table-checkboxes
777e2d05 Disable event propagation after click on the commit table checkbox
06208405 Merge pull request #1307 from versionpress/wp-4.9-for-test-sites

But this is impractical for my script, plus I'm thinking that I'm doing something wrong because I believe that the output of git log and git log -- . should be the same, shouldn't it?

From my experiments, it seems that the --full-history flag adds the merge commits when run on the . directory:

$ git log -10 --oneline --full-history -- .
2071052a (HEAD -> master, temp/master, origin/master, origin/HEAD) Merge pull request #1318 from aidik/master
94084136 Typo in the activation message
129bf972 Merge pull request #1314 from x1024/master
8a15a8b1 Added error when re-activating VP with WP CLI
288e305d Merge pull request #1310 from versionpress/1307-wp-update-fixes
6ad07b7d package-lock.json updated
e7e356f8 Updated WP and WP-CLI versions in ext-libs
f8d22592 Merge pull request #1309 from versionpress/1308-fix-commits-table-checkboxes
777e2d05 Disable event propagation after click on the commit table checkbox
06208405 Merge pull request #1307 from versionpress/wp-4.9-for-test-sites

However, it "breaks" the docs subdirectory (note the first merge commit which shouldn't be there):

$ git log -10 --oneline --full-history -- docs
06208405 Merge pull request #1307 from versionpress/wp-4.9-for-test-sites
81554a46 Small updates of Dev-Setup.md
84d3229e Updated intro message to mention beta instead of alpha
8244d42a Added link to announcement blog post
9671af87 (tag: 4.0-beta) Merge pull request #1283 from versionpress/1263-pre-4.0beta-polish
da97c7c3 Merge branch 'master' into 1263-pre-4.0beta-polish
5c99bc30 Merge pull request #1270 from versionpress/4.0-beta-release-notes
f22b0e27 4.0-beta release notes updated, are now close to final
f0de171f Specific WordPress version used in dev-env Dockerfile, Dev-Setup slightly updated
bcd6d4e8 package-lock JSONs updated for npm 5.1 (there were some issues reported with 5.0)

This can be "fixed" by adding --simplify-merges:

$ git log -10 --oneline --full-history --simplify-merges -- docs
81554a46 Small updates of Dev-Setup.md
84d3229e Updated intro message to mention beta instead of alpha
8244d42a Added link to announcement blog post
da97c7c3 Merge branch 'master' into 1263-pre-4.0beta-polish
5c99bc30 Merge pull request #1270 from versionpress/4.0-beta-release-notes
f22b0e27 4.0-beta release notes updated, are now close to final
b8a138ce Fixed path of plugin definition discovery
48333d82 4.0-beta release notes written (some TODOs still remaining)
2c61613f 4.0-alpha1 Markdown file renamed to such (used to be just 4.0) and updated to contain the same info as the GitHub release page
f0de171f Specific WordPress version used in dev-env Dockerfile, Dev-Setup slightly updated

But that also causes trouble for the . directory:

$ git log -10 --oneline --full-history --simplify-merges -- .
94084136 Typo in the activation message
8a15a8b1 Added error when re-activating VP with WP CLI
6ad07b7d package-lock.json updated
e7e356f8 Updated WP and WP-CLI versions in ext-libs
777e2d05 Disable event propagation after click on the commit table checkbox
81554a46 Small updates of Dev-Setup.md
32d6dbe2 Installed WordPress version bumpted to 4.9
459c2e6d package-lock.json updated for npm 5.5
77ad593c Link to Gitter and support repo in ISSUE_TEMPLATE.md
84d3229e Updated intro message to mention beta instead of alpha

I tried the -m flag as advised here but with no differences.

Now, is a user has log.follow set to true in their config, there is also some behavior that I don't fully understand.

$ git config --global log.follow true
(empty output)

$ git log -10 --oneline --merges -- .
(empty output)

No merge commits at all, even when asked for it. I need to add --no-follow (should probably be documented in Git docs):

$ git log -10 --oneline --merges --no-follow -- .
da97c7c3 Merge branch 'master' into 1263-pre-4.0beta-polish
5c99bc30 Merge pull request #1270 from versionpress/4.0-beta-release-notes
aba96d3f Merge pull request #1277 from versionpress/1274-using-filter-on-init
82a3fd4e Merge pull request #1269 from versionpress/ext-libs-install-locked
ccb74422 Merge pull request #1251 from versionpress/1120-edit-update-action
a94dc0d3 Merge pull request #1246 from versionpress/1176-plugin-definition-discovery
ae530356 Merge pull request #1260 from versionpress/1154-temp-in-zip
ffd7647e Merge pull request #1170 from versionpress/1168-getmenureference-broken
f4a00328 Merge branch 'master' into 1120-edit-update-action
7b29e7ed Merge branch 'master' into 1041-dockerized-dev-setup

So my hope would be that adding --no-follow and removing --merges would produce the expected output, however, it still misses merge commits in that case:

$ git log -10 --oneline --no-follow -- .
94084136 Typo in the activation message
8a15a8b1 Added error when re-activating VP with WP CLI
6ad07b7d package-lock.json updated
e7e356f8 Updated WP and WP-CLI versions in ext-libs
777e2d05 Disable event propagation after click on the commit table checkbox
81554a46 Small updates of Dev-Setup.md
32d6dbe2 Installed WordPress version bumpted to 4.9
459c2e6d package-lock.json updated for npm 5.5
77ad593c Link to Gitter and support repo in ISSUE_TEMPLATE.md
84d3229e Updated intro message to mention beta instead of alpha

This is consistent with the behavior above but I still don't understand it: I would assume that git log is essentially a combined output of git log --no-merges and git log --merges but it's not the case when a path is specified.

Any explanation of this would be greatly appreciated.


UPDATE: Maybe the issue is not with merge commits vs. plain commits. I've tried in another repo and compare the outputs when the path is not specified vs. when it's .:

$ git log -10 --oneline --no-follow
350df16f6 (HEAD -> master, origin/master, origin/HEAD) Merge pull request #1977 from versionpress/1975-do-not-upgrade-deleted-sites
21db78245 Merge branch 'master' into 1975-do-not-upgrade-deleted-sites
0a4eda432 Upgrading only sites that are not deleted
43cd96ac9 Merge pull request #1971 from versionpress/fix-ui-creation-of-sites
a952ae1c5 Merge pull request #1966 from versionpress/1949-reduce-db-migrate-boilerplate
018e4d3b7 Merge pull request #1969 from versionpress/1964-platform-api-easy-local-run
4515bc242 Fix add a new site button
6e4ecd652 Merge branch 'master' into prod
4229e326a Fixed Makefile of default-backend
fc99e0f19 [hotfix] Disabled removing TLS hosts

$ git log -10 --oneline --no-follow -- .
21db78245 Merge branch 'master' into 1975-do-not-upgrade-deleted-sites
0a4eda432 Upgrading only sites that are not deleted
43cd96ac9 Merge pull request #1971 from versionpress/fix-ui-creation-of-sites
a952ae1c5 Merge pull request #1966 from versionpress/1949-reduce-db-migrate-boilerplate
018e4d3b7 Merge pull request #1969 from versionpress/1964-platform-api-easy-local-run
4515bc242 Fix add a new site button
4229e326a Fixed Makefile of default-backend
fc99e0f19 [hotfix] Disabled removing TLS hosts
d1bcdae2d Replace 'db-migrate-boilerplate' with a custom implementation
c60032144 Kubernetes.ts returned to its original, non-async structure before 36308d3 with the token loading logic moved to `server.ts` (it didn't really belong to Kubernetes.ts).

It's just a different set of commits and I don't see a clear pattern in including / excluding commits here...


Some possibly relevant things from git-log docs:

  • There is a long section on History simplification, maybe there are some answers in there (didn't study it fully yet).
  • About log.follow configuration (emphasis mine): If true, git log will act as if the --follow option was used when a single is given. This has the same limitations as --follow, i.e. it cannot be used to follow multiple files and does not work well on non-linear history.

Adding some discussion from #git IRC:

[16:44] <+borekb> hi, should git log and git log -- . produce the same results? it seems that the latter is missing some merge commits, sometimes, and I don't quite understand why
[16:45] borekb: it might depend on what directory you are in?
[16:45] <+borekb> I'm in a root of a project
[16:45] <+borekb> I've posted some examples here: Git log for a directory including merges
[16:46] <+borekb> and honestly don't understand what is going on :) I feel like I must be missing something obvious as git log is such a basic command that I've used like a million times, though without a path spec
[16:48] borekb: that might be "history simplification". It's described on man git log.
[16:48] <@gitinfo> borekb: the git-log manpage is available at https://gitirc.eu/git-log.html
[16:49] borekb: oh, based on SO looks like you were already on track.
[16:50] <+borekb> rafasc: I also suspect it's that (direct link: https://git-scm.com/docs/git-log#_history_simplification), but would it be fair to assume that by default, the git log and git log -- . should "simplify" to the same output?
[16:50] <+borekb> my head exploded a little bit when I tried to read that section :)
[16:51] I think the answer i no. It's not safe to assume that. -- is a form of history simplification. So you're telling git you want simplification.
[16:54] <+borekb> good point
[16:56] <+borekb> from my experiments, it looks like -- path is behaving differently when it's some subfolder (git log -- docs) vs. when it's just the current directory (git log -- .).
[16:56] <+borekb> subfolder works as expected, . produces results I do not quite understand
[16:57] borekb: it has to do with that section on the man page that talks what is TREESAME and what isn't.
[16:57] borekb: try dummy_folder/.. :P
[16:59] <+borekb> up_here: clever hack but doesn't work :)
[17:01] borekb: unrelated, but when using online to inspect commits, you might want to use --show-linear-break, in order to understand the relation between commits. (note that --graph --online might also be misleading, graph needs at least two lines to draw the edges, --pretty=short in that case is useful)
[17:02] <+borekb> rafasc: oh that's nice
[17:03] <+borekb> rafac: how would you estimate chance of there being a combination of flags that would lead to git log -- . producing exactly the same output as git log? I'm asking before I dive into the TREESAME discussion which will be pain for me :)
[17:05] borekb: from memory, I would say --full-history... But you had problems with that right?
[17:06] <+borekb> rafasc: yep, --full-history makes more commits appear in git log -- subdirectory

like image 264
Borek Bernard Avatar asked Jun 06 '18 11:06

Borek Bernard


Video Answer


1 Answers

You are indeed being bitten by History Simplification. Note that simplification is enabled by default when using any path names with git log. It is not enabled by default if you do not supply path names. Adding particular options, like --full-history or --simplify-*

(You may also get bitten by the implied --follow from having log.follow set to true, but it's harder to see where that would occur for this particular case.)

The simplification works by doing very limited git diffs. Remember that as git log is walking through the commit graph, it is working on one commit C at a time. Each commit C has some set of parent commits. For an ordinary (non-merge) commit, there is just one parent, so for each file in C that is to be examined—based on the path names you gave—either that file in C is 100% identical to that file in its parent P, or it's different, and that's easy for Git to tell because a path that is 100% identical in both commits has the same blob hash in the commit's attached tree.

That's what the TREESAME expression in the documentation means: we take commit C's tree, remove all the paths that aren't being examined, leaving (in memory—none of this affects anything stored in the repository!) a skeleton tree attached to C that has the files that are being examined. Then we take the (single) parent P and do the same thing. The result is either matching—C and its parent P are TREESAME—or non-matching.

The commit is "interesting" and will be displayed if it's interesting. Even if it's not interesting, Git will still put the parent P into the graph-walk priority queue to examine later, because this is just an ordinary commit and Git must walk through it to construct a history. (There's some weirdness here with "parent rewriting" that I'm going to skip over, though it matters for --graph.)

At merges, however, things are different. Commit C still has its one tree as usual, but it has multiple parent commits Pi. Git will do the same "strip down the tree" operation for each parent. When you're not using --full-history, Git will then compare the stripped-down trees of C vs each Pi. The merge itself is included if it's not TREESAME to any parent, but if it is TREESAME to at least one parent Pi, the merge tends to get excluded (depending on other options) and Git puts only that parent into the priority queue for walking through the graph. If C is TREESAME to multiple PiPj Pk ..., Git picks one of these parents randomly and discards the rest, by default.

Adding --full-history disables the discarding of all but one Pi. So now Git will walk all the parents of the merge. This doesn't affect whether the merge itself is displayed, it just makes sure that Git walks both "sides" of the merge, or all arms if it's a multi-way octopus merge.

The logic here is that if the file(s) you're looking at are the same in commit C and commit Pi, why then, you don't care that they're different in some other parent Po, because the file has its current form due to parent Pi rather than parent Po. This logic is correct if you think that the file(s) you are looking at are right, but falls apart if you think they are wrong and you are looking for the merge that lost the changes you wanted.

A separate note on --follow

(Since your path name is ., and Git generally does not do directories at all—using a directory name really means all files anywhere under the directory, recursively—this shouldn't matter here. If you use a file name, though, it might matter. Remember that --follow is only obeyed if you're looking at exactly one file.)

The way that --follow works, which is the reason it only works for one path name (and shouldn't be a problem with . as the path), is that when Git is doing this choose whether a commit that we walk, as we walk through the commit graph, is interesting and should therefore be displayed testing, it's doing these git diffs on each commit vs its parent(s).

Unlike the TREESAME diff, the --follow test is a full diff—it's more expensive than the quick 100%-the-same, at least for the more interesting problem cases—but it's limited to one file, which keeps it from being too costly. It also applies only to single-parent commits, though this comes after --first-parent (if you used that) strips away the other parents or after -m (if you used that) splits a merge into multiple virtual commits that share the same tree, or after history simplification has picked just one parent to follow.1 In any case, if the parent does not have a file with the (single) path name that you're logging, Git does a full diff of the parent and the child to see if it can find some renamed file in the parent. If it can find such a renamed file, first it shows the child—because the file changed: it was at the very least renamed after all—and then Git changes the path name it is looking for as it traverses to the child's parent.

That is, Git started out looking for dir/sub/file.ext, hit a commit C where the parent of C didn't have a dir/sub/file.ext, did a full-blown diff, and found a sufficiently similar file named path/to/old.name. So Git shows you commit C, saying R<percent> path/to/old.name -> dir/sub/file.ext, and then moves on to P—but now instead of looking for changes to the path dir/sub/file.ext, it's looking for changes to the path path/to/old.name.

This particular trick can't work well across all merges: the file could be renamed in just one of the various arms of the merge, or it could be renamed in multiple arms, depending on who did the renaming and when. Git can only look for one path name—it doesn't keep looking for both names. Of course, supplying a path name turns on history simplification, so in general there aren't any merges to worry about after all. The merge case happens only if you use a flag like --full-history or --simplify-merges.


1Note that if History Simplification has picked one parent from a merge, it has picked a P that is TREESAME to C after stripping out all files except the one we care about—so by definition, the one file we're --following in C matches the same-named file in parent P. This means commit C will turn out to be uninteresting after all.

like image 167
torek Avatar answered Oct 24 '22 13:10

torek