Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I perform a diff that ignores all comments?

I have a large codebase that was forked from the original project and I'm trying to track down all the differences from the original. A lot of the file edits consist of commented out debugging code and other miscellaneous comments. The GUI diff/merge tool called Meld under Ubuntu can ignore comments, but only single line comments.

Is there any other convenient way of finding only the non-comment diffs, either using a GUI tool or linux command line tools? In case it makes a difference, the code is a mixture of PHP and Javascript, so I'm primarily interested in ignoring //, /* */ and #.

like image 542
Matt V. Avatar asked Sep 21 '11 17:09

Matt V.


3 Answers

To use visual diff, you can try Meld or DiffMerge.

DiffMerge

Its rulesets and options provide for customized behavior.

GNU diffutils

From the command-line perspective, you can use --ignore-matching-lines=RE option for diff, for example:

diff -d -I '^#' -I '^ #' file1 file2

Please note that the regex has to match the corresponding line in both files and it matches every changed line in the hunk in order to work, otherwise it'll still show the difference.

Use single quotes to protect pattern from shell expanding and to escape the regex-reserved characters (e.g. brackets).

We can read in diffutils manual:

However, -I only ignores the insertion or deletion of lines that contain the regular expression if every changed line in the hunk (every insertion and every deletion) matches the regular expression.

In other words, for each non-ignorable change, diff prints the complete set of changes in its vicinity, including the ignorable ones. You can specify more than one regular expression for lines to ignore by using more than one -I option. diff tries to match each line against each regular expression, starting with the last one given.

This behavior is also well explained by armel here.


See also:

  • How to diff files ignoring comments (lines starting with #)?

Alternatively, check other diff apps, for example:

  • for macOS: Code compare and merge tools
  • for Windows: 3-way merge tools for Windows
like image 150
kenorb Avatar answered Oct 23 '22 10:10

kenorb


You can filter both files through stripcmt first which will remove C and C++ comments. For removing # comments, sed 's/#.*//' will remove those.

Of course you will loose some context when removing comments first, but on the other hand differences in comments will not make any problems. I think I would have done it like the following (described for a single file, automate as required):

  1. If the latest version of the original code base is A and the latest of the copied code base is B, let's call the versions with comments removed for A' and B' (e.g. save those to temporarily files while processing).
  2. Find some common origin version and strip comments from that into O' (alternatively just re-use B' for this).
  3. Perform a 3-way merge of O', A' and B' and save to C'. KDiff3 is an excellent tool for this.
  4. Now you have the code changes you want merged, however C' is without comments, so get back into "normal" mode, do a new 3-way merge with A' as base and A and C'. This will pick up the changes between A' and C' (which is the code changes what you want) into the normal code base with comments based on version A.

Drawing version trees on paper is before you start is highly recommended to get a clear picture of which versions you want to work on. But don't be limited of what the tree is showing, you can merge any version and in any direction if you just figure out what versions to use.

like image 1
hlovdal Avatar answered Oct 23 '22 10:10

hlovdal


diff <file1> <file2> | grep -v '^[<>]\ #'

Far from perfect but it will give an idea of the differences

like image 1
Vadym Tyemirov Avatar answered Oct 23 '22 12:10

Vadym Tyemirov