Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In the context of git (and diff), what is a "hunk"

Tags:

git

diff

I was looking for a definition of "hunk" while reading some git documentation.

I know it means a description of the difference between two files and that it has a well defined format, but I couldn't call to mind a succinct definition.

I tried searching with google, but there were a lot of somewhat spurious hits.

like image 310
nzc Avatar asked Jun 03 '16 17:06

nzc


People also ask

What is a hunk in git?

When you enter Git's patch mode, the chunks of code ('hunks') you're offered to stage/skip can sometimes be bigger than you'd want. Maybe a hunk you're offered contains multiple lines with changes that belong in more than one commit. Luckily, the s option is there to split the hunk down further.

What is a hunk in coding?

Code hunks are lines of code surrounding each backtrace frame. Airbrake notifiers collect up to 5 lines of code for each stack frame.

What is show diff hunk?

When right-clicking on a modified region of a file, a menu entry Show Diff Hunk will be available. This menu item will display the previous content inline beneath the current content. Right-clicking again will show a menu item Hide Diff Hunk to hide the inline diff.

What are git diffs?

Comparing changes with git diff Diffing is a function that takes two input data sets and outputs the changes between them. git diff is a multi-use Git command that when executed runs a diff function on Git data sources. These data sources can be commits, branches, files and more.


2 Answers

And eventually I found this:

When comparing two files, diff finds sequences of lines common to both files, interspersed with groups of differing lines called hunks.

here: http://www.gnu.org/software/diffutils/manual/html_node/Hunks.html

Which was exactly the kind of succinct definition I was looking for. Hopefully this helps someone else out!

like image 144
nzc Avatar answered Oct 09 '22 09:10

nzc


The term "hunk" is indeed not specific to Git, and comes from the Gnu diffutil format. Even more succinctly:

Each hunk shows one area where the files differ.

But the challenge for Git is to determine the right boundaries for a hunk.

The rest of the answer helps illustrates what a hunk looks like in Git:

After various heuristics (like the compaction one, which is gone in Git 2.12), Git maintainers settled on the indent one, which was introduced in Sept. 2016 with Git 2.11, commit 433860f.

Some groups of added/deleted lines in diffs can be slid up or down, because lines at the edges of the group are not unique.
Picking good shifts for such groups is not a matter of correctness but definitely has a big effect on aesthetics.
For example, consider the following two diffs.
The first is what standard Git emits:

--- a/9c572b21dd090a1e5c5bb397053bf8043ffe7fb4:git-send-email.perl +++ b/6dcfa306f2b67b733a7eb2d7ded1bc9987809edb:git-send-email.perl @@ -231,6 +231,9 @@ if (!defined $initial_reply_to && $prompting) {  }   if (!$smtp_server) { +       $smtp_server = $repo->config('sendemail.smtpserver'); +} +if (!$smtp_server) {         foreach (qw( /usr/sbin/sendmail /usr/lib/sendmail )) {                 if (-x $_) {                         $smtp_server = $_; 

The following diff is equivalent, but is obviously preferable from an aesthetic point of view:

--- a/9c572b21dd090a1e5c5bb397053bf8043ffe7fb4:git-send-email.perl +++ b/6dcfa306f2b67b733a7eb2d7ded1bc9987809edb:git-send-email.perl @@ -230,6 +230,9 @@ if (!defined $initial_reply_to && $prompting) {         $initial_reply_to =~ s/(^\s+|\s+$)//g;  }  +if (!$smtp_server) { +       $smtp_server = $repo->config('sendemail.smtpserver'); +}  if (!$smtp_server) {         foreach (qw( /usr/sbin/sendmail /usr/lib/sendmail )) {                 if (-x $_) { 

This patch teaches Git to pick better positions for such "diff sliders" using heuristics that take the positions of nearby blank lines and the indentation of nearby lines into account.


With Git 2.14 (Q3 2017), that indent heuristic will be the default!

See commit 1fa8a66 (08 May 2017) by Jeff King (peff).
See commit 33de716 (08 May 2017) by Stefan Beller (stefanbeller).
See commit 37590ce, commit cf5e772 (08 May 2017) by Marc Branchaud.
(Merged by Junio C Hamano -- gitster -- in commit 53083f8, 05 Jun 2017)

diff: enable indent heuristic by default

The feature was included in v2.11 (released 2016-11-29) and we got no negative feedback. Quite the opposite, all feedback we got was positive.

Turn it on by default. Users who dislike the feature can turn it off by setting diff.indentHeuristic.


With Git 2.24 (Q4 2019), the "indent heuristics" that decides where to split diff hunks has seen its documentation corrected.

See commit 64e5e1f (15 Aug 2019) by SZEDER Gábor (szeder).
(Merged by Junio C Hamano -- gitster -- in commit e115170, 09 Sep 2019)

diff: 'diff.indentHeuristic' is no longer experimental

The indent heuristic started out as experimental, but it's now our default diff heuristic since 33de716 (diff: enable indent heuristic by default, 2017-05-08, Git v2.14.0-rc0).
Alas, that commit didn't update the documentation, and the description of the 'diff.indentHeuristic' configuration variable still implies that it's experimental and not the default.

Update the description of 'diff.indentHeuristic' to make it clear that it's the default diff heuristic.

The description of the related '--indent-heuristic' option has already been updated in this answer.

The documentation will now read:

diff.indentHeuristic:

Set this option to false to disable the default heuristics that shift diff hunk boundaries to make patches easier to read.


With Git 2.25 (Q1 2020), you don't even have to specify --indent-heuristic anymore (since it is the default for quite some times now).

See commit 44ae131 (28 Oct 2019) by SZEDER Gábor (szeder).
(Merged by Junio C Hamano -- gitster -- in commit 532d983, 01 Dec 2019)

builtin/blame.c: remove '--indent-heuristic' from usage string

Signed-off-by: SZEDER Gábor

The indent heuristic is our default diff heuristic since 33de716387 ("diff: enable indent heuristic by default", 2017-05-08, Git v2.14.0-rc0 -- merge listed in batch #7), but the usage string of 'git blame' still mentions it as "experimental heuristic".

We could simply update the short help associated with the option, but according to the comment above the option's declaration it was "only included here to get included in the "-h" output".

That made sense while the feature was still experimental and we wanted to give it more exposure, but nowadays it's unnecessary.

So let's rather remove the '--indent-heuristic' option from 'git blame's usage string.

Note that 'git blame' will still accept this option, as it is parsed in parse_revision_opt().

Astute readers may notice that this patch removes a comment mentioning "the following two options", but it only removes one option.

The reason is that the comment is outdated: that other options was '--compaction-heuristic', and it has already been removed in 3cde4e02ee (diff: retire "compaction" heuristics, 2016-12-23), but that commit forgot to update this comment.

like image 24
VonC Avatar answered Oct 09 '22 09:10

VonC