Dealing with hunks that only change whitespace

Question

In a code I am maintaining, I sometimes receive pull requests where the committer just reflowed a paragraph for no apparent reason. Here is an example:

diff --git a/knuth.tex b/knuth.tex
index 2f6a2f8..7b0827d 100644
--- a/knuth.tex
+++ b/knuth.tex
@@ -1,6 +1,6 @@
 Thus, I came to the conclusion that the designer of a new
 system must not only be the implementer and first
-large||scale user; the designer should also write the first
+large-scale user; the designer should also write the first
 user manual.

 The separation of any of these four components would have
@@ -9,8 +9,7 @@ all these activities, literally hundreds of improvements
 would never have been made, because I would never have
 thought of them or perceived why they were important.

-But a system cannot be successful if it is too strongly
-influenced by a single person. Once the initial design is
-complete and fairly robust, the real test begins as people
-with many different viewpoints undertake their own
-experiments.
+But a system cannot be successful if it is too strongly influenced by
+a single person. Once the initial design is complete and fairly
+robust, the real test begins as people with many different viewpoints
+undertake their own experiments.

As you can see the first hunk introduced an actual change by substituting || with -, whereas the second hunk does not change anything but line breaking and whitespace. In fact, the word-diff of the second hunk would be empty.

Is it possible to either put a policy in place (e.g. on GitHub or in my CI) to reject commits containing such “empty” hunks, or reformat the patch to omit these hunks altogether so that I can git apply it without them?

Related: How to git-apply a git word diff

bk2204 · Accepted Answer

If you're looking for a built-in solution, I'm not aware of one. However, that doesn't mean this can't be relatively easily built into a CI system.

You could pipe the output of an appropriate git diff command into a script like the following, which will exit 1 if fed a patch containing a hunk like the second above.

#!/usr/bin/env ruby

def filter(arr)
  arr.join.split("

").map { |x| x.gsub(/\s+/, ' ') }.join("

")
end

def should_reject(before, after)
  return false if before.empty? && after.empty?
  before = filter(before)
  after = filter(after)
  return true if before == after
  false
end

chunk = nil
before = []
after = []
while (line = gets)
  trimmed = line[1..-1]
  case line
  when /^(\+\+\+|---)/
    # Do nothing.
  when /^@@ /
    if should_reject(before, after)
      warn "Useless change to hunk #{chunk}"
      exit 1
    end
    chunk = line
    before = []
    after = []
  when /^ /
    before << trimmed
    after << trimmed
  when /^\+/
    after << trimmed
  when /^-/
    before << trimmed
  end
end

if should_reject(before, after)
  warn "Useless change to hunk #{chunk}"
  exit 1
end

It essentially splits each hunk into chunks with a blank line between them, turns all whitespace into spaces, and then compares them. If they're equal, it complains and exits nonzero. You may wish to modify it to be more robust, like dealing with CRLF endings or such, but the approach is viable.

As a side note, one approach that makes these kinds of changes easier to spot is to use the sentence-per-line style. Each sentence, regardless of length, is on one whole line, and there is only one sentence per line. This makes diffing any sort of change much easier and avoids wrapping issues altogether.

Dealing with hunks that only change whitespace

Tags:

git

diff

patch

whitespace

Henri Menke

1 Answers

bk2204

Recent Activity

Donate For Us

Dealing with hunks that only change whitespace

Tags:

git

diff

patch

whitespace

Henri Menke

1 Answers

bk2204

Related questions

Recent Activity

Donate For Us