Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does increasing the number of unified diff context lines have any downsides?

Tags:

git

diff

By default, diff -u and git diff produce unified diffs with context lines. Apart from the size of the diff file itself, is there any disadvantage to increasing the number of context lines? I assume that it may help in cases where the file(s) to be patched have been modified since the patch was made. Specifically, if you increase the number of context lines, are there cases where patch will fail, where it wouldn't have if you hadn't done that?

like image 940
Andrew Ferrier Avatar asked Apr 15 '13 11:04

Andrew Ferrier


People also ask

Does git diff have a limit?

Because diffs can become very large, we impose these limits on diffs for pull requests: A file's diff cannot exceed 2000 changed lines or 102,400 bytes (100 KB) of raw diff data. The entire diff cannot exceed 8000 changed lines. The maximum number of files in a single diff is limited to 200.

What is a unified diff?

The unified format (or unidiff) inherits the technical improvements made by the context format, but produces a smaller diff with old and new text presented immediately adjacent. Unified format is usually invoked using the " -u " command line option. This output is often used as input to the patch program.

What is unified format?

The unified output format is a variation on the context format that is more compact because it omits redundant context lines. To select this output format, use the --unified[= lines ] ( -U lines ), or -u option. The argument lines is the number of lines of context to show.

How do patch files work?

patch is a command that takes the output from the diff and puts it into a file. Then, it can take the filed output and overwrite another file with with the changes. For example, a common use is to use the patch to transfer changes from the changed file to the original file, thus making them identical.


1 Answers

Yes. Consider the following case:

There's a file f1:

a
b
c
d
e
f
g

You modify the f line, and get either

--- f1  2013-04-15 13:23:57.524966109 +0200
+++ f2  2013-04-15 13:25:17.832965720 +0200
@@ -5,3 +5,3 @@
 e
-f
+f2
 g

or

--- f1  2013-04-15 13:23:57.524966109 +0200
+++ f2  2013-04-15 13:25:17.832965720 +0200
@@ -1,7 +1,7 @@
 a
 b
 c
 d
 e
-f
+f2
 g

depending on whether you use the -U1 or the -U5 option with diff. Assume now that someone else edited the upper section of the file as follows:

a
b1
c
d
e
f
g

Here's the output of the two patch commands:

$ patch f3 < u1.patch 
patching file f3
$ patch f3 < u5.patch 
patching file f3
Hunk #1 succeeded at 1 with fuzz 2.

The patch was successfully applied in both scenarios, however, in the second run we had to use a fuzz value of 2. What does that mean?

First patch looks for a place where all lines of the context match. If no such place is found, and it's a context diff, and the maximum fuzz factor is set to 1 or more, then another scan takes place ignoring the first and last line of context. If that fails, and the maximum fuzz factor is set to 2 or more, the first two and last two lines of context are ignored, and another scan is made.

As you can see from this description from man patch, the patch created with the -U5 version will take longer to apply in such a scenario, or even worse, if the fuzz value used by patch isn't big enough, applying the patch will fail.

like image 81
blubb Avatar answered Sep 19 '22 03:09

blubb