Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to REALLY show logs of renamed files with git

Tags:

git

I think that the general drive behind Linus' point is that - and take this with a pinch of salt - hardcore Git users don't ever care about the history of a "file". You put content in a Git repository because the content as a whole has a meaningful history.

A file rename is a small special case of "content" moving between paths. You might have a function that moves between files which a Git user might track down with the "pickaxe" functionality (e.g., log -S).

Other "path" changes include combining and splitting files; Git doesn't really care which file you consider renamed and which one you consider copied (or renamed and deleted). It just tracks the complete content of your tree.

Git encourages "whole tree" thinking whereas many version control systems are very file-centric. This is why Git refers to "paths" more often than it refers to "filenames".


I have exactly the same issue that you are facing. Even though I can give you no answer, I believe you can read this email Linus wrote back in 2005, it is very pertinent and might give you a hint about how to handle the problem:

…I'm claiming that any SCM that tries to track renames is fundamentally broken unless it does so for internal reasons (ie to allow efficient deltas), exactly because renames do not matter. They don't help you, and they aren't what you were interested in anyway.

What matters is finding "where did this come from", and the git architecture does that very well indeed - much better than anything else out there. …

I found it referenced by this blog post, which could also be useful for you to find a viable solution:

In the message, Linus outlined how an ideal content tracking system may let you find how a block of code came into the current shape. You'd start from the current block of code in a file, go back in the history to find the commit that changed the file. Then you inspect the change of the commit to see if the block of code you are interested in is modified by it, as a commit that changes the file may not touch the block of code you are interested in, but only some other parts of the file.

When you find that before the commit the block of code did not exist in the file, you inspect the commit deeper. You may find that it is one of the many possible situations, including:

  1. The commit truly introduced the block of code. The author of the commit was the inventor of that cool feature you were hunting its origin for (or the guilty party who introduced the bug); or
  2. The block of code did not exist in the file, but five identical copies of it existed in different files, all of which disappeared after the commit. The author of the commit refactored duplicated code by introducing a single helper function; or
  3. (as a special case) Before the commit, the file that currently contains the block of the code you are interested in itself did not exist, but another file with nearly identical contents did exist, and the block of the code you are interested in, together with all the other contents in the file existed back then, did exist in that other file. It went away after the commit. The author of the commit renamed the file while giving it a minor modification.

In git, Linus's ultimate content tracking tool does not yet exist in a fully automated fashion. But most of the important ingredients are available already.

Please, keep us posted about your progress on this.


I noticed that most of the graphical git front-ends and IDE plugins don't seem to be able to display the history of a file if the file has been renamed

You'll be happy to know that some popular Git UI tools now support this. There are dozens of Git UI tools available, so I won't list them all, but for example:

  • Sourcetree, when viewing a file log, has a checkbox "Follow renamed files" in the bottom left
  • TortoiseGit has a "follow renames" checkbox on the log window in the bottom left.

More information on Git UI tools:

  • http://git-scm.com/downloads/guis
  • https://git.wiki.kernel.org/index.php/InterfacesFrontendsAndTools

Note: git 2.9 (June2016) will improve quite a bit the "buggy" nature of git log --follow:

See commit ca4e3ca (30 Mar 2016) by SZEDER Gábor (szeder).
(Merged by Junio C Hamano -- gitster -- in commit 26effb8, 13 Apr 2016)

diffcore: fix iteration order of identical files during rename detection

If the two paths 'dir/A/file' and 'dir/B/file' have identical content and the parent directory is renamed, e.g. 'git mv dir other-dir', then diffcore reports the following exact renames:

renamed:    dir/B/file -> other-dir/A/file
renamed:    dir/A/file -> other-dir/B/file

(note the inversion here: B/file -> A/file, and A/file -> B/file)

While technically not wrong, this is confusing not only for the user, but also for git commands that make decisions based on rename information, e.g. 'git log --follow other-dir/A/file' follows 'dir/B/file' past the rename.

This behavior is a side effect of commit v2.0.0-rc4~8^2~14 (diffcore-rename.c: simplify finding exact renames, 2013-11-14): the hashmap storing sources returns entries from the same bucket, i.e. sources matching the current destination, in LIFO order.
Thus the iteration first examines 'other-dir/A/file' and 'dir/B/file' and, upon finding identical content and basename, reports an exact rename.


With Git 2.31 (Q1 2021), the file-level rename detection has been improved for diffcore.

See commit 350410f (29 Dec 2020), and commit 9db2ac5, commit b970b4e, commit ac14de1, commit 5c72261, commit 81c4bf0, commit ad8a1be, commit 00b8ccc, commit 26a66a6 (11 Dec 2020) by Elijah Newren (newren).
(Merged by Junio C Hamano -- gitster -- in commit a5ac31b, 25 Jan 2021)

diffcore-rename: accelerate rename_dst setup

Signed-off-by: Elijah Newren

register_rename_src() simply references the passed pair inside rename_src.

In contrast, add_rename_dst() did something entirely different for rename_dst.
Instead of copying the passed pair, it made a copy of the second diff_filespec from the passed pair, referenced it, and then set the diff_rename_dst.pair field to NULL.
Later, when a pairing is found, record_rename_pair() allocated a full diff_filepair via diff_queue() and pointed its src and dst fields at the appropriate diff_filespecs.

This contrast between register_rename_src() for the rename_src data structure and add_rename_dst() for the rename_dst data structure is oddly inconsistent and requires more memory and work than necessary.
[...] This patch accelerated the setup time by about 65%, and final write back to the output queue time by about 50%, resulting in an overall drop of 3.5% on the execution time of rebasing a few dozen patches.