Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Examples of different results produced by the standard (Myers), minimal, patience and histogram diff algorithms

Tags:

git-diff

diff

Git offers these 4 diff algorithms, but without any further information what are their differences.

What are the advantages of each of this algorithms? Is there some comparison of various cases where the algorithms perform differently?

like image 839
Petr Avatar asked Nov 13 '13 09:11

Petr


2 Answers

I think there are multiple algorithms supported because none of the algorithms are clearly the best choice in all cases.

The differences are in readability of the patch output and processing time needed to generate the patch.

Summarizing, this is what I understand the differences are:

  • Myers: The original algorithm as implemented in xdiff (http://www.xmailserver.org/xdiff-lib.html and http://www.xmailserver.org/diff2.pdf), optimizing the 'edit distance' for changed lines.
  • Minimal: Myers plus trying to minimize the patch size.
  • Patience: Attempts to trade readability of the patch versus patch size and processing time. See What is `git diff --patience` for? and http://bramcohen.livejournal.com/73318.html or http://alfedenzo.livejournal.com/170301.html for a description.
  • Histogram: Mainly created for speed. Faster than Myers and Patience, originally developed in jgit (http://eclipse.org/jgit/)

Here is a comparison of speed for Myers, patience, and histogram: http://marc.info/?l=git&m=133103975225142&w=2

Here is a comparison of diff output of Histogram vs Myers: http://marc.info/?l=git&m=138023003519837&w=2

like image 115
jelle foks Avatar answered Nov 16 '22 04:11

jelle foks


Although comparing only 2 algorithms: Myers and Histogram, it might help. A study by Nugroho et al. reveals the level of disagreement between both diff algorithms. The study performed 3 comparisons, namely metrics, SZZ algorithm, and patches. From the comparison of metrics and SZZ, we can see the high differences between Myers and Histogram in the number of different identified code changes. It is true that none of those diff's are incorrect in describing changes. However, from the manual patches comparison, the Histogram algorithm provides a reasonable diff output better in describing human change intention.

like image 4
YusufUMS Avatar answered Nov 16 '22 05:11

YusufUMS