Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to make 'git diff' ignore comments

Tags:

git

git-diff

diff

I am trying to produce a list of the files that were changed in a specific commit. The problem is, that every file has the version number in a comment at the top of the file - and since this commit introduces a new version, that means that every file has changed.

I don't care about the changed comments, so I would like to have git diff ignore all lines that match ^\s*\*.*$, as these are all comments (part of /* */).

I cannot find any way to tell git diff to ignore specific lines.

I have already tried setting a textconv attribute to cause Git to pass the files to sed before diffing them, so that sed can strip out the offending lines - the problem with this, is that git diff --name-status does not actually diff the files, just compares the hashes, and of course all the hashes have changed.

Is there a way to do this?

like image 821
Benubird Avatar asked May 13 '13 16:05

Benubird


People also ask

What does ++ mean in git diff?

When viewing a combined diff, if the two files you're comparing have a line that's different from what they were merged into, you will see the ++ to represent: one line that was added does not appear in either file1 or file2.

What is A and B in git diff?

In most cases, Git picks A and B in such a way that you can think of A/- as "old" content and B/+ as "new" content. Let's look at our example: Change #1 contains two lines prepended with a "+". Since no counterpart in A existed for these lines (no lines with "-"), this means that these lines were added.

Is git diff useful?

The git diff command helps you see, compare, and understand changes in your project. You can use it in many different situations, e.g. to look at current changes in your working copy, past changes in commits, or even to compare branches.

Does git ignore whitespace?

We use the git diff -w command to ignore all whitespace differences. It will ignore spaces at the beginning, middle, and end of lines.


2 Answers

Here is a solution that is working well for me. I've written up the solution and some additional missing documentation on the git (log|diff) -G<regex> option.

It is basically using the same solution as in previous answers, but specifically for comments that start with a * or a #, and sometimes a space before the *... But it still needs to allow #ifdef, #include, etc. changes.

Look ahead and look behind do not seem to be supported by the -G option, nor does the ? in general, and I have had problems with using *, too. + seems to be working well, though.

(Note, tested on Git v2.7.0)

Multi-Line Comment Version

git diff -w -G'(^[^\*# /])|(^#\w)|(^\s+[^\*#/])' 
  • -w ignore whitespace
  • -G only show diff lines that match the following regex
  • (^[^\*# /]) any line that does not start with a star or a hash or a space
  • (^#\w) any line that starts with # followed by a letter
  • (^\s+[^\*#/]) any line that starts with some whitespace followed by a comment character

Basically an SVN hook modifies every file in and out right now and modifies multi-line comment blocks on every file. Now I can diff my changes against SVN without the FYI information that SVN drops in the comments.

Technically this will allow for Python and Bash comments like #TODO to be shown in the diff, and if a division operator started on a new line in C++ it could be ignored:

a = b     / c; 

Also the documentation on -G in Git seemed pretty lacking, so the information here should help:

git diff -G<regex>

-G<regex>

Look for differences whose patch text contains added/removed lines that match <regex>.

To illustrate the difference between -S<regex> --pickaxe-regex and -G<regex>, consider a commit with the following diff in the same file:

+    return !regexec(regexp, two->ptr, 1, &regmatch, 0); ... -    hit = !regexec(regexp, mf2.ptr, 1, &regmatch, 0); 

While git log -G"regexec\(regexp" will show this commit, git log -S"regexec\(regexp" --pickaxe-regex will not (because the number of occurrences of that string did not change).

See the pickaxe entry in gitdiffcore(7) for more information.

(Note, tested on Git v2.7.0)

  • -G uses a basic regular expression.
  • No support for ?, *, !, {, } regular expression syntax.
  • Grouping with () and OR-ing groups works with |.
  • Wild card characters such as \s, \W, etc. are supported.
  • Look-ahead and look-behind are not supported.
  • Beginning and ending line anchors ^$ work.
  • Feature has been available since Git 1.7.4.

Excluded Files v Excluded Diffs

Note that the -G option filters the files that will be diffed.

But if a file gets "diffed" those lines that were "excluded/included" before will all be shown in the diff.

Examples

Only show file differences with at least one line that mentions foo.

git diff -G'foo' 

Show file differences for everything except lines that start with a #

git diff -G'^[^#]' 

Show files that have differences mentioning FIXME or TODO

git diff -G`(FIXME)|(TODO)` 

See also git log -G, git grep, git log -S, --pickaxe-regex, and --pickaxe-all

UPDATE: Which regular expression tool is in use by the -G option?

https://github.com/git/git/search?utf8=%E2%9C%93&q=regcomp&type=

https://github.com/git/git/blob/master/diffcore-pickaxe.c

if (opts & (DIFF_PICKAXE_REGEX | DIFF_PICKAXE_KIND_G)) {     int cflags = REG_EXTENDED | REG_NEWLINE;     if (DIFF_OPT_TST(o, PICKAXE_IGNORE_CASE))         cflags |= REG_ICASE;     regcomp_or_die(&regex, needle, cflags);     regexp = &regex;  // and in the regcom_or_die function regcomp(regex, needle, cflags); 

http://man7.org/linux/man-pages/man3/regexec.3.html

   REG_EXTENDED           Use POSIX Extended Regular Expression syntax when interpreting           regex.  If not set, POSIX Basic Regular Expression syntax is           used. 

// ...

   REG_NEWLINE           Match-any-character operators don't match a newline.            A nonmatching list ([^...])  not containing a newline does not           match a newline.            Match-beginning-of-line operator (^) matches the empty string           immediately after a newline, regardless of whether eflags, the           execution flags of regexec(), contains REG_NOTBOL.            Match-end-of-line operator ($) matches the empty string           immediately before a newline, regardless of whether eflags           contains REG_NOTEOL. 
like image 135
phyatt Avatar answered Oct 04 '22 00:10

phyatt


git diff -G <regex> 

And specify a regular expression that does not match your version number line.

like image 41
riezebosch Avatar answered Oct 04 '22 00:10

riezebosch