Are there tools out there which could automatically find copy-and-paste code among a set of files?
I was thinking of writing a script for this, which would just search for equal strings, but such script would find mostly irrelevant equalities. (Such as private final static ...
).
Human programmers can detect some instances of copy-paste manually, at least in fairly small code bases. Doing this is not too difficult for an automated tool, even for large programs (though it can be tricky when the copies are modified, as frequently happens).
Business owners should be aware of the possibility that their website can theoretically track if someone copies and pastes text from their website. They should also be aware of the mentioned plugins that can track this activity. If they are concerned about this activity, they can use a plugin to track it.
Of course, the reality is that developers still copy and paste code from places like Stack Overflow, CodePen, and freeCodeCamp.
Yes, try the Copy Paste Detector.
Our CloneDR is a tool for finding exact and near-miss blocks of code constructed by copy and paste activities. It can handle systems of millions of lines of code.
It uses precise language grammars to pick out language structures (identifiers, expressions, statements, blocks, functions, classes, packages, ...) that have been copied, and to determine the points of variation across the sets of clones (any of those structures can be parameters!)
CloneDR operates on a wide variety of languages: C, C++, C#, Java, PHP, COBOL, Python, Ada, Fortran, EGL and visual basic (VBScript, VB6, VB.net).
The website has a number of sample clone detection reports from a variety of those languages.
This product is available for evaluation on http://www.semanticdesigns.com. Other open source alternatives are Simian and PMD CPD
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With