I think that the general drive behind Linus' point is that - and take this with a pinch of salt - hardcore Git users don't ever care about the history of a "file". You put content in a Git repository because the content as a whole has a meaningful history.
A file rename is a small special case of "content" moving between paths. You might have a function that moves between files which a Git user might track down with the "pickaxe" functionality (e.g., log -S
).
Other "path" changes include combining and splitting files; Git doesn't really care which file you consider renamed and which one you consider copied (or renamed and deleted). It just tracks the complete content of your tree.
Git encourages "whole tree" thinking whereas many version control systems are very file-centric. This is why Git refers to "paths" more often than it refers to "filenames".
I have exactly the same issue that you are facing. Even though I can give you no answer, I believe you can read this email Linus wrote back in 2005, it is very pertinent and might give you a hint about how to handle the problem:
…I'm claiming that any SCM that tries to track renames is fundamentally broken unless it does so for internal reasons (ie to allow efficient deltas), exactly because renames do not matter. They don't help you, and they aren't what you were interested in anyway.
What matters is finding "where did this come from", and the git architecture does that very well indeed - much better than anything else out there. …
I found it referenced by this blog post, which could also be useful for you to find a viable solution:
In the message, Linus outlined how an ideal content tracking system may let you find how a block of code came into the current shape. You'd start from the current block of code in a file, go back in the history to find the commit that changed the file. Then you inspect the change of the commit to see if the block of code you are interested in is modified by it, as a commit that changes the file may not touch the block of code you are interested in, but only some other parts of the file.
When you find that before the commit the block of code did not exist in the file, you inspect the commit deeper. You may find that it is one of the many possible situations, including:
- The commit truly introduced the block of code. The author of the commit was the inventor of that cool feature you were hunting its origin for (or the guilty party who introduced the bug); or
- The block of code did not exist in the file, but five identical copies of it existed in different files, all of which disappeared after the commit. The author of the commit refactored duplicated code by introducing a single helper function; or
- (as a special case) Before the commit, the file that currently contains the block of the code you are interested in itself did not exist, but another file with nearly identical contents did exist, and the block of the code you are interested in, together with all the other contents in the file existed back then, did exist in that other file. It went away after the commit. The author of the commit renamed the file while giving it a minor modification.
In git, Linus's ultimate content tracking tool does not yet exist in a fully automated fashion. But most of the important ingredients are available already.
Please, keep us posted about your progress on this.
I noticed that most of the graphical git front-ends and IDE plugins don't seem to be able to display the history of a file if the file has been renamed
You'll be happy to know that some popular Git UI tools now support this. There are dozens of Git UI tools available, so I won't list them all, but for example:
More information on Git UI tools:
Note: git 2.9 (June2016) will improve quite a bit the "buggy" nature of git log --follow
:
See commit ca4e3ca (30 Mar 2016) by SZEDER Gábor (szeder
).
(Merged by Junio C Hamano -- gitster
-- in commit 26effb8, 13 Apr 2016)
If the two paths '
dir/A/file
' and 'dir/B/file
' have identical content and the parent directory is renamed, e.g. 'git mv dir other-dir
', thendiffcore
reports the following exact renames:
renamed: dir/B/file -> other-dir/A/file
renamed: dir/A/file -> other-dir/B/file
(note the inversion here: B/file -> A/file
, and A/file -> B/file
)
While technically not wrong, this is confusing not only for the user, but also for git commands that make decisions based on rename information, e.g. '
git log --follow other-dir/A/file
' follows 'dir/B/file
' past the rename.
This behavior is a side effect of commit v2.0.0-rc4~8^2~14 (
diffcore-rename.c
: simplify finding exact renames, 2013-11-14): the hashmap storing sources returns entries from the same bucket, i.e. sources matching the current destination, in LIFO order.
Thus the iteration first examines 'other-dir/A/file
' and 'dir/B/file
' and, upon finding identical content and basename, reports an exact rename.
With Git 2.31 (Q1 2021), the file-level rename detection has been improved for diffcore
.
See commit 350410f (29 Dec 2020), and commit 9db2ac5, commit b970b4e, commit ac14de1, commit 5c72261, commit 81c4bf0, commit ad8a1be, commit 00b8ccc, commit 26a66a6 (11 Dec 2020) by Elijah Newren (newren
).
(Merged by Junio C Hamano -- gitster
-- in commit a5ac31b, 25 Jan 2021)
diffcore-rename
: acceleraterename_dst
setupSigned-off-by: Elijah Newren
register_rename_src()
simply references the passed pair insiderename_src
.In contrast,
add_rename_dst()
did something entirely different forrename_dst
.
Instead of copying the passed pair, it made a copy of the seconddiff_filespec
from the passed pair, referenced it, and then set thediff_rename_dst
.pair field toNULL
.
Later, when a pairing is found,record_rename_pair()
allocated a fulldiff_filepair
viadiff_queue()
and pointed itssrc
anddst
fields at the appropriatediff_filespecs
.This contrast between
register_rename_src()
for therename_src
data structure andadd_rename_dst()
for therename_dst
data structure is oddly inconsistent and requires more memory and work than necessary.
[...] This patch accelerated the setup time by about 65%, and final write back to the output queue time by about 50%, resulting in an overall drop of 3.5% on the execution time of rebasing a few dozen patches.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With