Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

New git diff compaction heuristic isn't working

Tags:

git

git-diff

Update

This is solved now, thanks to @torek's answer

I needed to have more lines above the new addition so that Git could move up the diff'ed block, and by that I mean:

+a
+b
+c
+
+["foo", "bar", "baz"].map do |i|
+  i
+end
+
 ["foo", "bar", "baz"].map do |i|
   i.upcase
 end

Note: I tried with a single line-break instead of a\nb\nc\n and it also worked as well


Original Question...

I'm using Git 2.9 on Mac OSX

Here is a reduced test case:

$ mkdir git-highlight && cd git-highlight
$ touch foo.rb

I add and commit the following content:

["foo", "bar", "baz"].map do |i|
  i.upcase
end

Now I modify the file to have the following content:

["foo", "bar", "baz"].map do |i|
  i
end

["foo", "bar", "baz"].map do |i|
  i.upcase
end

If I was to run either git diff or git diff --compaction-heuristic then I get the following unexpected output:

diff --git a/foo.rb b/foo.rb
index 9056b22..f0d289a 100644
--- a/foo.rb
+++ b/foo.rb
@@ -1,3 +1,7 @@
 ["foo", "bar", "baz"].map do |i|
+  i
+end
+
+["foo", "bar", "baz"].map do |i|
   i.upcase
 end

If you read this blog post from GitHub https://github.com/blog/2188-git-2-9-has-been-released I'm led to believe that my output should look something more like:

+["foo", "bar", "baz"].map do |i|
+  i
+end
+
 ["foo", "bar", "baz"].map do |i|
   i.upcase
 end

The idea being git's diffing algorithm is more intelligent and able to identify the block change.

I've also tried adding the following to my ~/.gitconfig but it doesn't make any difference to the outcome, I still get the unexpected output:

[diff]
  compactionHeuristic = true

Any ideas on what I'm missing here?

like image 947
Integralist Avatar asked Jun 18 '16 08:06

Integralist


2 Answers

Speaking of "git diff compaction heuristic isn't working"... Git 2.12 (Q1 2017) will retire that heuristic.
Git 2.14 will set the indent heuristic instead as the default one.

"git diff" and its family had two experimental heuristics to shift the contents of a hunk to make the patch easier to read.
One of them turns out to be better than the other, so leave only the "--indent-heuristic" option and remove the other one.

See commit 3cde4e0 (23 Dec 2016) by Junio C Hamano (gitster).
Suggested-by: Jeff King (peff).
(Merged by Junio C Hamano -- gitster -- in commit 2ced5f2, 10 Jan 2017)

In details:

diff: retire "compaction" heuristics

When a patch inserts a block of lines, whose last lines are the same as the existing lines that appear before the inserted block, "git diff" can choose any place between these existing lines as the boundary between the pre-context and the added lines (adjusting the end of the inserted block as appropriate) to come up with variants of the same patch, and some variants are easier to read than others.

We have been trying to improve the choice of this boundary, and Git 2.11 shipped with an experimental "compaction-heuristic".
Since then another attempt to improve the logic further resulted in a new "indent-heuristic" logic.
It is agreed that the latter gives better result overall, and the former outlived its usefulness.

Retire "compaction", and keep "indent" as an experimental feature.
The latter hopefully will be turned on by default in a future release, but that should be done as a separate step.


Update for the next Git 2.15.x/2.16 (Q1 2018):

See commit bab7614 (29 Oct 2017) by Carlos Martín Nieto (carlosmn).
(Merged by Junio C Hamano -- gitster -- in commit 662ac3b, 06 Nov 2017)

diff: --indent-heuristic is no longer experimental

This heuristic has been the default since 2.14 so we should not confuse our users by saying that it's experimental and off by default.


Note: "git diff --indent-heuristic" had a bad corner case performance, fixed in Git 2.19 (Q3 2018).

See commit 301ef85 (27 Jul 2018) by Stefan Beller (stefanbeller).
(Merged by Junio C Hamano -- gitster -- in commit 791ad49, 17 Aug 2018)

xdiff: reduce indent heuristic overhead

Skip searching for better indentation heuristics if we'd slide a hunk more than its size.
This is the easiest fix proposed in the analysis in response to a patch that mercurial took for xdiff to limit searching by a constant.
Using a performance test as:

#!python
open('a', 'w').write(" \n" * 1000000)
open('b', 'w').write(" \n" * 1000001)

This patch reduces the execution of "git diff --no-index a b" from 0.70s to 0.31s. However limiting the sliding to the size of the diff hunk, which was proposed as a solution (that I found easiest to implement for now) is not optimal for cases like:

open('a', 'w').write(" \n" * 1000000)
open('b', 'w').write(" \n" * 2000000)

as then we'd still slide 1000000 times.

In addition to limiting the sliding to size of the hunk, also limit by a constant. Choose 100 lines as the constant as that fits more than a screen, which really means that the diff sliding is probably not providing a lot of benefit anyway.

like image 84
VonC Avatar answered Nov 07 '22 20:11

VonC


I have not played with the new features yet, so this may need anywhere from a grain of salt to a full salt lick, but:

Any ideas on what I'm missing here?

The description specifically says that git diff hunks are moved up until reaching a blank line. There is no blank line above line 1; if there were, that might suffice. (It might require even more "above" context—which need not be blank—since the norm is to include three lines of context above and below, and it's not clear from the brief description whether the "moving up" will be stymied by the lack of additional lines.)

like image 40
torek Avatar answered Nov 07 '22 21:11

torek