Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do you compare two files containing C code based on code structure, not merely textual differences?

I have two files containing C code which I wish to compare. I'm looking for a utility which will construct a syntax tree for each file, and compare the syntax trees, instead of merely comparing the text of the files. This way minor differences in formatting and style will be ignored. It would be nice to even be able to tell the comparison tool to ignore differences such as variable names, etc.

Correct me if I'm wrong, but diff doesn't have this capability. I'm a Ubuntu user. Thanks!

like image 250
Corey Jeffco Avatar asked Nov 07 '10 05:11

Corey Jeffco


People also ask

How do you compare two C codes?

Step 1: Open both the file with pointer at the starting. Step 2: Fetch data from file as characters one by one. Step 3: Compare the characters. If the characters are different then return the line and position of the error character.

How do you compare two text files in VS code?

Compare selected filesSelect two files in Solution Explorer and right-click to bring up the context menu. Then select Selected Files to see them side-by-side in the diff view.

Which techniques can be used to determine if two files are identical or not?

you can either evaluate hash of each file and compare hashes, or read files byte-by-bite and compare the bites. This will answer the questions if the files are identical or not.


1 Answers

Our SD Smart Differencer does exactly what you want. It uses compiler-quality parsers to read source code and build ASTs for two files you select. It then compares the trees guided by the syntax, so it doesn't get confused by whitespace, layout or comments. Because it normalize the values of constants, it doesn't get confused by change of radix or how you expressed escape sequences!

The deltas are reported at the level of the langauge constructs (variable, expression, statement, declaration, function, ...) in terms of programmer intent (delete, insert, copy, move) complete with determining that an identifier has been renamed consistently throughout a changed block.

The SmartDifferencer has versions available for C (in a number of dialects; if you compiler-accurate parse, the langauge dialect matters) was well as for C++, Java, C#, JavaScript, COBOL, Python and many other langauges.

If you want to understand how a set of files are related to one another, our SD CloneDR will accept a very large set of files, and tell you what they have in common. It finds code that has been copy-paste-edited across the entire set. You don't have to tell it what to look for; it finds it automatically. Using ASTs (as above), it isn't fooled by whitespace changes or renames of identifiers. There's a bunch of sample clone detection reports for various languages at the web site.

like image 122
Ira Baxter Avatar answered Oct 22 '22 16:10

Ira Baxter