Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I detect copy & pasted code using git?

Tags:

git

copy-paste

I just read the git-blame manual page once more and noticed this part:

A particularly useful way is to see if an added file has lines created by copy-and-paste from existing files. Sometimes this indicates that the developer was being sloppy and did not refactor the code properly. You can first find the commit that introduced the file with:

git log --diff-filter=A --pretty=short -- foo

and then annotate the change between the commit and its parents, using commit^! notation:

git blame -C -C -f $commit^! -- foo

This sounds quite interesting, but I don't quite grok how it works, and why. I wonder whether it can be used in a git hook to detect copy & pasted code.

Can some git expert maybe explain the effect of using the above git commands together, and whether it's possible to use something like that to make git show whether there's code duplication (maybe by using the 'similarity index' which git seems to computed when renaming files)?

like image 860
Frerich Raabe Avatar asked Dec 21 '09 11:12

Frerich Raabe


People also ask

Can copy and paste be detected?

Human programmers can detect some instances of copy-paste manually, at least in fairly small code bases. Doing this is not too difficult for an automated tool, even for large programs (though it can be tricky when the copies are modified, as frequently happens).

How can I identify a copy of text?

To get to your clipboard history at any time, press Windows logo key + V. From the clipboard history, you can paste and pin frequently used items by choosing an individual item from your clipboard menu.


1 Answers

You can break the commands down individually.

$ git log --diff-filter=A --pretty=short -- foo

displays the log for the file "foo". The --diff-filter option only shows commits in which files were added ("A"), and shows it in a condensed format (the --pretty=short option). (The -- is a standard for saying "nothing that follows is an option", and everything after that is a list of file names on which the log should be applied.)

Then:

$ git blame -C -C -f $commit^! -- foo

git blame annotates each line of a file with information from the last commit. The double -C -C option aggressively checks for lines that were copied from other files. The -f option shows the filename of the original commit (which means if a line was copied from another file, you see the name of the file it was copied from). The $commit^! is notation for a $commit; the ^! suffix means to exclude all of $commit's parents.

So basically, the first command (git log) helps you find commits that introduced copied lines; the second (git blame) helps you find the source for any suspicious commits returned by git log.

like image 122
mipadi Avatar answered Sep 23 '22 21:09

mipadi