Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why git blame does not follow renames?

$ pwd /data/mdi2/classes  $ git blame -L22,+1 -- utils.js 99b7a802 mdi2/utils.js (user 2015-03-26 21:54:57 +0200 22)  #comment  $ git blame -L22,+1 99b7a802^ -- utils.js fatal: no such path mdi2/classes/utils.js in 99b7a802^ 

As you have noticed, the file were in different directory in that commit

$ git blame -L22,+1 99b7a802^ -- ../utils.js c5105267 (user 2007-04-10 08:00:20 +0000 22)    #comment 2 

Despite on doc

The origin of lines is automatically followed across whole-file renames (currently there is no option to turn        the rename-following off) 

blame does not follow renames. Why?

UPDATE: Short answer

git blame follow renames but not for git blame COMMIT^ -- <filename>

But this is too hard to track file renames manually through bulk of renames and ton of history. I think, this behaviour must be fixed to silently follow renames for git blame COMMIT^ -- <filename>. Or, at least, --follow must be implemented, so I can: git blame --follow COMMIT^ -- <filename>

UPDATE2: That is impossible. Read below.

ANSWER FROM MAILLIST by Junio C Hamano

git blame follow renames but not for git blame COMMIT^ -- <filename>

Suppose you have file A and file B in your version v1.0.

Six month down the road, the code was much refactored, and you do not need the contents of these two files separately. You have removed A and B and much of what they had is now in file C. That is the current state.

git blame -C HEAD -- C 

may follow the contents from both just fine, but if you were allowed to say

git blame v1.0 -- C 

what does it even mean? C did not exist v1.0 at all. Are you asking to follow the contents of A back then, or B? How did you tell you meant A and not B when you told it C in this command?

"git blame" follows content movements, and never treats "renames" in any special way, as it is a stupid thing to do to think a rename is somehow special ;-)

The way you tell what content to start digging from to the command from its command line is to give starting point commit (defaults to HEAD but you may give COMMIT^ as your example) and the path in that starting point. As it does not make any sense to tell C to Git and then magically make it guess you meant A in some cases and B in some other. If v1.0 did not have C, the only sensible thing to do is to exit instead of making a guess (and without telling the user how it guessed).

like image 477
Eugen Konkov Avatar asked Apr 06 '15 09:04

Eugen Konkov


People also ask

How does blame work in git?

Summary. The git blame command is used to examine the contents of a file line by line and see when each line was last modified and who the author of the modifications was. The output format of git blame can be altered with various command line options.

Does git blame identify a particular commit?

The git blame command is used to know who/which commit is responsible for the latest changes made to a file. The author/commit of each line can also been seen. There are many other options for blame, but generally these could help.

Does git blame show lines that were deleted or replaced?

The report does not tell you anything about lines which have been deleted or replaced; you need to use a tool such as git diff or the "pickaxe" interface briefly mentioned in the following paragraph.

Why is git blame called?

In case you don't know git-blameIf you want to know who last changed a particular chunk of code, you use Git to run a special command. That command is called blame. In other words, you don't ask who the author is, you ask who's to blame for a particular contribution.


2 Answers

git blame does follow renames (as does git log if you give it --follow). The problem lies in the way it follows renames, which is a not-very-thorough hack: as it steps back one commit at a time (from each child to each parent), it makes a diff—the same kind of diff you can make manually with:

git diff -M SHA1^ SHA1 

—and checks to see if this diff detected a rename.1

That's all fine as far as it goes, but it means that for git blame to detect a rename, (a) git diff -M has to be able to detect it (fortunately that is the case here) and—here's what's causing you problems—it must step across the rename.

For instance, suppose the commit graph looks a bit like this:

A <-- B <-- ... Q <-- R <-- S <-- T 

where each uppercase letter represents a commit. Suppose further that a file was renamed in commit R, so that in commits R through T it has name newname while in commits A through Q it has name oldname.

If you run git blame -- newname, the sequence starts at T, compares S and T, compares R and S, and compares Q and R. When it compares Q and R, git blame discovers the name-change, and starts looking for oldname in commits Q and earlier, so when it compares P and Q it compares files oldname and oldname in those two commits.

If, on the other hand, you run git blame R^ -- newname (or git blame Q -- newname) so that the sequence starts at commit Q, there is no file newname in that commit, and there is no rename when comparing P and Q, and git blame simply gives up.

The trick is that if you're starting from a commit in which the file had the previous name, you must give git the old name:

git blame R^ -- oldname 

and then it all works again.


1In the git diff documentation, you will see that there is a -M option that controls how git diff detects renames. The blame code modifies this a bit (and in fact does two passes, one with -M turned off and a second with -M turned on) and uses its own (different) -M option for somewhat different purposes, but ultimately it's using this same code.


[Edit to add reply to comment (didn't fit as a comment itself)]:

Is any tool that can show me file renames like: git renames <filename> SHA date oldname->newname

Not exactly, but git diff -M comes close, and may be close enough.

I'm not sure what you mean by "SHA date" here, but git diff -M allows you to supply two SHA-1s and compares left-vs-right. Add --name-status to get just file names and dispositions. Hence git diff -M --name-status HEAD oldsha1 may report that to convert from HEAD to oldsha1, git believes you should Rename a file and will report the old name as the "new" name. For instance, in the git repository itself, there is a file currently named Documentation/giteveryday.txt that used to have a slightly different name:

$ git diff -M --name-status HEAD 992cb206 M       .gitignore M       .mailmap [...snip...] M       Documentation/diff-options.txt R097    Documentation/giteveryday.txt   Documentation/everyday.txt D       Documentation/everyday.txto [...] 

If that's the file you care about, you're good. The two problems here are:

  • finding an SHA1: where did 992cb206 come from? If you already have an SHA-1, that's easy; if not, git rev-list is the SHA1-finding tool; read its documentation;
  • and the fact that following a series of renames through each commit one commit at a time, as git blame does, may produce quite different answers than comparing a much-later commit (HEAD) against a much-earlier commit (992cb206 or whatever). In this case, it comes out the same, but the "similarity index" here is 97 out of 100. If it were to have been modified much more in some of the intermediate steps, that similarity index might fall below 50% ... yet, if we were to compare a revision just a little after 992cb206 to 992cb206 (as git blame would), perhaps the similarity index between those two files might be higher.

What's needed (and missing) is for git rev-list itself to implement --follow, so that all commands that use git rev-list internally—i.e., most commands that work on more than just one revision—can do the trick. Along the way, it would be nice if it worked in the other direction (currently --follow is newer-to-older only, i.e., works fine with git blame and works ok with git log as long you don't ask for oldest history first with --reverse).

like image 125
torek Avatar answered Oct 05 '22 05:10

torek


**see UPD. Now you can follow renamed files

Latest git has interesting command. Add next to your config:

[alias]     follow= "!sh -c 'git log --topo-order -u -L $2,${3:-$2}:"$1"'" - 

Now you can:

$git follow <filename> <linefrom> [<lineto>] 

And you will see each commit that change specified lines in <filename>.

Also you can be interested in --follow option of git log command:

Continue listing the history of a file beyond renames (works only for a single file).

If you are interested in copy detection use -C:

Detect copies as well as renames. See also --find-copies-harder. If n is specified, it has the same meaning as for -M.

-C will look different files in same commit. If you want detect that code was taken from different file that was not changed in this commit. Then you should provide --find-copies-harder option.

For performance reasons, by default, -C option finds copies only if the original file of the copy was modified in the same changeset. This flag makes the command inspect unmodified files as candidates for the source of copy. This is a very expensive operation for large projects, so use it with caution. Giving more than one -C option has the same effect.

UPD
I improve this alias:

[alias] follow = "!bash -c '                                                 \     if [[ $1 == \"/\"* ]]; then                                      \         FILE=$1;                                                     \     else                                                             \         FILE=${GIT_PREFIX}$1;                                        \     fi;                                                              \     echo \"git log --topo-order -u -L $2,${3:-$2}:\\\"$FILE\\\" $4 \";   \     git log -w -b -p --ignore-blank-lines --topo-order -u -L $2,${3:-$2}:\"$FILE\" $4;\ ' --" 

Now you can track how specified range of lines are changed:

git follow file_name.c 30 35 

Even you can continue follow different file starting with commit (@arg4)

git follow old_file_name.c 30 35 85ce061 

85ce061 - is commit where file was renamed

NOTICE: Unfortunately git does not take into account changes in working directory. Thus if you do local changes to file you must stash it before you can follow changes

like image 38
Eugen Konkov Avatar answered Oct 05 '22 05:10

Eugen Konkov