We inherited some leagcy code that has a whole lot of code copy/pasted across projects. Is there a way to find these? PMD can do a single project
There is also CloneDetective, Simian and Simscan. This paper from the International Conference on Software Engineering 2009 compares them, and PMD's CPD.
One tool that can handle several languages is CloneDetective (based on ConQuat, Continuous Quality Assessment Toolkit): ABAP, ADA, Java, C#, C/C++, Visual Basic, Cobol, PL1.
Another tool is Simian, the Similarity Analyser, which identifies duplication in Java, C#, C, C++, COBOL, Ruby, JSP, ASP, HTML, XML, Visual Basic, Groovy source code and even plain text files. It runs on JVM and .NET.
Actually, if you look at .NET, there are a lot of copy paste detection tools...
SimScan, the SimilarityScanner is an Eclipse/IDEA/JBUILDER plugin that finds duplicated or similar fragments of code in large Java source code bases. I don't know it, and have no idea what "similar fragments" means. It sounds like it might also just look isolatedly in single projects, but the IntelliJ-Screenshots look nifty.
This paper from the International Conference on Software Engineering 2009 compares CloneDetective, PMD's CPD, Simian and Simscan.
Just as PMD's copy & paste finder is actually called CPD for "copy paste detector", using that term as the terminus technicus for googling helps. Another term often used is "clone detection".
You could try using the command line version of PMD CPD:
http://pmd.sourceforge.net/cpd.html
You should be able to specify multiple source trees to check.
Simian, which is the other prominent copy/paste detector has similar command line capabilities.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With