Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I compare similar codebases?

We have several C++ projects that were built from the same codebase. There's a lot of similarities and common code between them but they were developed independently; source was not shared in any way. Classes and files will have been renamed even if the underlying code hasn't changed and individual lines will have been tweaked, changed and replaced.

I'd like to be able to compare the different codebases and find out how much of the code is still the same. It can be fairly high level - % of code that is the same is fine. I also need to be able to automate this process.

Is there a tool that I can run on the codebases and get some sort of report/assessment of how much is common?

like image 421
Mendokusai Avatar asked Oct 25 '25 02:10

Mendokusai


2 Answers

I don't have much experience with this sort of thing, but it made me think back to my school days when our University would run everyones code through a program to find cheaters. This brought me to the following link:

Source Code Similarity Detection

It names some open source and commercial software that should meet your needs.

like image 109
RC. Avatar answered Oct 26 '25 15:10

RC.


There is the java tool dude, part of the MOOSE software reengineering toolkit, by Richard Wettel. It is documented in his (masters?) thesis. MOOSE provides much more than just this, you might want to look at his Codecity.

I've used it on java, c#, delphi, xml. It should work ok on c++ too. For large code bases, don't forget to give it enough heap space, and start with a simple similarity metric.

like image 44
Stephan Eggermont Avatar answered Oct 26 '25 17:10

Stephan Eggermont