Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

I need a tool to find duplicates or similar blocks of text in a singular text file or set of text files

I want to automate moving duplicate or similar C code into functions.

This must work under Linux.

like image 850
vitaly.v.ch Avatar asked Dec 15 '09 15:12

vitaly.v.ch


2 Answers

A subset of your problem: Detecting duplicate code:

Try: PMD

Duplicate code can be hard to find, especially in a large project. But PMD's Copy/Paste Detector (CPD) can find it for you! CPD has been through three major incarnations:

  • First we wrote it using a variant of Michael Wise's Greedy String Tiling algorithm (our variant is described here)
  • Then it was completely rewritten by Brian Ewins using the Burrows-Wheeler transform
  • Finally, it was rewritten by Steve Hawkins to use the Karp-Rabin string matching algorithm.

...

Note that CPD works with Java, JSP, C, C++, Fortran and PHP code.

like image 144
miku Avatar answered Oct 18 '22 23:10

miku


Simian (noted earlier) is a good tool for this. I have been using CloneDetective on my project and it works great. CloneDetective is free, so it can't hurt to give it a try.

like image 1
Mark Ewer Avatar answered Oct 18 '22 23:10

Mark Ewer