Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using an alternate diff algorithm in Git

Tags:

git

diff

Because git is designed for source code, its default diff algorithm treats a line as the minimum indivisible unit.

I am trying to edit some markdown files that are word wrapped at column 80. Adding a sentence can cause the rest of the paragraph to be marked as changed.

Is there a way to have Git use a diff algorithm more suited to text? I need one that treats words or sentences as indivisible units rather than lines?

like image 926
John F. Miller Avatar asked May 14 '11 02:05

John F. Miller


People also ask

What diff algorithm does git use?

In Git, there are four diff algorithms, namely Myers, Minimal, Patience, and Histogram, which are utilized to obtain the differences of the two same files located in two different commits. The Minimal and the Histogram algorithms are the improved versions of the Myers and the Patience respectively.

Can you git diff two files?

The git diff command displays the differences between files in two commits or between a commit and your current repository. You can see what text has been added to, removed from, and changed in a file. By default, the git diff command displays any uncommitted changes to your repository.

What is diff algorithm?

A diff algorithm outputs the set of differences between two inputs. These algorithms are the basis of a number of commonly used developer tools. Yet understanding the inner workings of diff algorithms is rarely necessary to use said tools.

What does ++ mean in git diff?

When viewing a combined diff, if the two files you're comparing have a line that's different from what they were merged into, you will see the ++ to represent: one line that was added does not appear in either file1 or file2.


3 Answers

You might try git diff --word-diff instead.

$ git diff --word-diff
diff --git a/test.txt b/test.txt
index 54585bb..a8cd97e 100644
--- a/test.txt
+++ b/test.txt
@@ -1,7 +1,7 @@
Because git is designed for source code, its diff algorithms {+are bibbity +}
{+bobbity boo+} treat a line as the minimum indivisible unit. I am trying to edit 
some markdown files that are word wrapped at column 80. Adding a sentence can 
cause the rest of the paragraph to be marked as changed.

Is there a way to have Git use a diff algorithm more suited to text? One that 
treats words or sentences as indivisible units rather then lines?
 No newline at end of file
like image 145
Matthew Ratzloff Avatar answered Oct 06 '22 00:10

Matthew Ratzloff


Maybe you are looking for word-diff

--word-diff[=<mode>]

Show a word diff, using the <mode> to delimit changed words. By default, words are delimited by whitespace; see --word-diff-regex below. The <mode> defaults to plain, and must be one of:

color

Highlight changed words using only colors. Implies --color.

plain

Show words as [-removed-] and {added}. Makes no attempts to escape the delimiters if they appear in the input, so the output may be ambiguous.

porcelain

Use a special line-based format intended for script consumption. Added/removed/unchanged runs are printed in the usual unified diff format, starting with a +/-/ character at the beginning of the line and extending to the end of the line. Newlines in the input are represented by a tilde ~ on a line of its own.

none

Disable word diff again.

Note that despite the name of the first mode, color is used to highlight the changed parts in all modes if enabled.

http://git-scm.com/docs/git-diff

like image 34
manojlds Avatar answered Oct 06 '22 01:10

manojlds


Here is an example of customising this (from this question). As a default, --word-diff assumes a word to be a string of non-whitespace characters. The following command will consider a word consist of one of the following:

  1. A string of alpha-numeric characters and underscores
  2. A single non-character

The command:

git diff --color-words --word-diff-regex='[A-z0-9_]+|[^[:space:]]'
like image 41
Casebash Avatar answered Oct 05 '22 23:10

Casebash