Several times, I have come across the statement that, if you move a single function from one file to another file, Git can track it. For example, this entry says, "Linus says that if you move a function from one file to another, Git will tell you the history of that single function across the move."
But I have a little bit of awareness of some of Git's under-the-hood design, and I don't see how this is possible. So I'm wondering ... is this is a correct statement? And if so, how is this possible?
My understanding is that Git stores each file's contents as a Blob, and each Blob has a globally unique identity which arises from the SHA hash of its contents and size. Git then represents folders as Trees. Any filename information belongs to the Tree, not to the Blob, so a file rename for example shows up as a change to a Tree, not to a Blob.
So if I have a file called "foo" with 20 functions in it, and a file called "bar" with 5 functions in it, and I move one of the functions from foo into bar (resulting in 19 and 6, respectively), how can Git detect that I moved that function from one file to another?
From my understanding, this would cause 2 new blobs to exist (one for the modified foo and one for the modified bar). I realize a diff could be calculated to show that the function was moved from one file to the other. But I don't see how history about the function could possibly become associated with bar instead of foo (not automatically, anyway).
If Git were to actually look inside of single files, and compute a blob per function (which would be crazy / infeasible, because you'd have to know how to parse any possible language), then I could see how this might be possible.
So ... is the statement correct or not? And if it is correct, then what is lacking in my understanding?
This functionality is provided through git blame -C <file>
.
The -C
option drives git into trying to find matches between addition or deletion of chunks of text in the file being reviewed and the files modified in the same changesets. Additional -C -C
, or -C -C -C
extend the search.
Try for yourself in a test repo with git blame -C
and you'll see that the block of code that you just moved is originated in the original file where it belonged to.
From the git help blame
manual page:
The origin of lines is automatically followed across whole-file renames (currently there is no option to turn the rename-following off). To follow lines moved from one file to another, or to follow lines that were copied and pasted from another file, etc., see the
-C
and-M
options.
As of Git 2.15, git diff
now supports detection of moved lines with the --color-moved
option. It works for moves across files.
It works, obviously, for colorized terminal output. As far as I can tell, there is no option to indicate moves in plain text patch format, but that makes sense.
For default behavior, try
git diff --color-moved
The command also takes options, which currently are no
, default
, plain
, zebra
and dimmed_zebra
(Use git help diff
to get the latest options and their descriptions). For example:
git diff --color-moved=zebra
As to how it is done, you can glean some understanding from this email exchange by the author of the functionality.
A bit of this functionality is in git gui blame
(+ filename). It shows an annotation of the lines of a file, each indicating when it was created and when last changed. For code movement across a file, it shows the commit of the original file as a creation, and the commit where it was added to the current file as last change. Try it.
What I really would want is to give git log
as some argument a line number range additionally to a file path, and then it would show the history of this code block. There is no such option, if the documentation is right. Yes, from Linus' statement I too would think such a command should be readily available.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With