I just read the git-blame manual page once more and noticed this part:
A particularly useful way is to see if an added file has lines created by copy-and-paste from existing files. Sometimes this indicates that the developer was being sloppy and did not refactor the code properly. You can first find the commit that introduced the file with:
git log --diff-filter=A --pretty=short -- foo
and then annotate the change between the commit and its parents, using commit^! notation:
git blame -C -C -f $commit^! -- foo
This sounds quite interesting, but I don't quite grok how it works, and why. I wonder whether it can be used in a git hook to detect copy & pasted code.
Can some git expert maybe explain the effect of using the above git commands together, and whether it's possible to use something like that to make git show whether there's code duplication (maybe by using the 'similarity index' which git seems to computed when renaming files)?
Human programmers can detect some instances of copy-paste manually, at least in fairly small code bases. Doing this is not too difficult for an automated tool, even for large programs (though it can be tricky when the copies are modified, as frequently happens).
To get to your clipboard history at any time, press Windows logo key + V. From the clipboard history, you can paste and pin frequently used items by choosing an individual item from your clipboard menu.
You can break the commands down individually.
$ git log --diff-filter=A --pretty=short -- foo
displays the log for the file "foo". The --diff-filter
option only shows commits in which files were added ("A"), and shows it in a condensed format (the --pretty=short
option). (The --
is a standard for saying "nothing that follows is an option", and everything after that is a list of file names on which the log should be applied.)
Then:
$ git blame -C -C -f $commit^! -- foo
git blame
annotates each line of a file with information from the last commit. The double -C -C
option aggressively checks for lines that were copied from other files. The -f
option shows the filename of the original commit (which means if a line was copied from another file, you see the name of the file it was copied from). The $commit^!
is notation for a $commit; the ^!
suffix means to exclude all of $commit's parents.
So basically, the first command (git log
) helps you find commits that introduced copied lines; the second (git blame
) helps you find the source for any suspicious commits returned by git log
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With