Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to detect code duplication during development? [closed]

We have a fairly large code base, 400K LOC of C++, and code duplication is something of a problem. Are there any tools which can effectively detect duplicated blocks of code?

Ideally this would be something that developers could use during development rather than just run occasionally to see where the problems are. It would also be nice if we could integrate such a tool with CruiseControl to give a report after each check in.

I had a look at Duploc some time ago, it showed a nice graph but requires a smalltalk environment to use it, which makes running it automatically rather difficult.

Free tools would be nice, but if there are some good commercial tools I would also be interested.

like image 448
David Dibben Avatar asked Oct 10 '08 14:10

David Dibben


People also ask

Why do we want to avoid code duplication is it possible to have code with no duplication at all?

The Issue With Code DuplicationDuplication greatly decreases the maintainability of your code. Ideally, introducing a change in business logic should require your team to change one class or function, and no more. Otherwise, a developer has to spend extra effort hunting down all these extra occurrences.

Why is code duplication not recommended?

It's safe to say that duplicate code makes your code awfully hard to maintain. It makes your codebase unnecessary large and adds extra technical debt. On top of that, writing duplicate code is a waste of time that could have been better spent.


2 Answers

Simian detects duplicate code in C++ projects.

Update: Also works with Java, C#, C, COBOL, Ruby, JSP, ASP, HTML, XML, Visual Basic, Groovy source code and even plain text files

like image 117
Simon Steele Avatar answered Oct 21 '22 01:10

Simon Steele


I've used PMD's Copy-and-Paste-Detector and integrated it into CruiseControl by using the following wrapper script (be sure to have the pmd jar in the classpath).

Our check runs nightly. If you wish to limit output to list only files from the current change set you might need some custom programming (idea: check all and list only duplicates where one of the changed files is involved. You have to check all files because a change could use some code from a non-changed file). Should be doable by using XML output and parsing the result. Don't forget to post that script when it's done ;)

For starters the "Text" output should be ok, but you will want to display the results in a user-friendly way, for which i use a perl script to generate HTML files from the "xml" output of CPD. Those are accessible by posting them to the tomcat where cruise's reporting jsp resides. The developers can view them from there and see the results of their dirty hacking :)

It runs quite fast, less than 2 seconds on 150 KLoc code (empty lines and comments not counted in that number).

duplicatecheck.xml:

<project name="duplicatecheck" default="cpd">  <property name="files.dir" value="dir containing your sources"/> <property name="output.dir" value="dir containing results for publishing"/>  <target name="cpd">     <taskdef name="cpd" classname="net.sourceforge.pmd.cpd.CPDTask"/>     <cpd minimumTokenCount="100"           language="cpp"           outputFile="${output.dir}/duplicates.txt"          ignoreLiterals="false"          ignoreIdentifiers="false"          format="text">         <fileset dir="${files.dir}/">             <include name="**/*.h"/>             <include name="**/*.cpp"/>                 <!-- exclude third-party stuff -->             <exclude name="boost/"/>             <exclude name="cppunit/"/>         </fileset>     </cpd> </target> 

like image 29
user39039 Avatar answered Oct 21 '22 01:10

user39039