Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

git word diff regex strange behaviour

I'm using Git to version prose and have been trying git diff --word-diff to see changes within lines. I want to use the results generated in a script.

But the default way that --word-diff identifies a word seems flawed. So I've been experimenting with --word-diff-regex= options.

Problem

Here are the two main flaws I'm trying to deal with:

  1. Added whitespace seems to be ignored. But whitespace can be quite important if trying to use the results programmatically.

    For example, take this header from a Markdown (.md) file:

    # Test file
    

    Now, let's add some text to the end of it:

    # Test file in Markdown
    

    If I run git diff --word-diff on this:

    # Test file {+in Markdown+}
    

    But the space before the word "in" has not been included as part of the diff.

  2. Empty lines are completely ignored.

    Here's a standard git diff for the content of a file where I've removed a line and also added a couple of new lines -- one empty, the other with the text "Here's a new line."

     This is a test file to see how word diff responds in certain situations.
    -
     I'll try removing lines and adding them to see what happens.
    
     Here's another line so we can see what happens with line removals and additions. I want to see how `git diff --word-diff` handles it all!
    +
    +Here's a new line.
    

    But here's git diff --word-diff for the same content:

    This is a test file to see how word diff responds in certain situations.
    
    I'll try removing lines and adding them to see what happens.
    
    Here's another line so we can see what happens with line removals and additions. I want to see how `git diff --word-diff` handles it all!
    
    {+Here's a new line.+}
    

    The removed and added empty lines are completely ignored.

Desired results

Putting the two examples above together. Here's what I'd like to see:

# Test file{+ in Markdown+}

This is a test file to see how word diff responds in certain situations.
{--}
I'll try removing lines and adding them to see what happens.

Here's another line so we can see what happens with line removals and additions. I want to see how `git diff --word-diff` handles it all!
{++}
{+Here's a new line.+}

Things I've tried:

  • git diff --word-diff-regex='.' seems too granular for when new words share characters with existing words
  • git diff --word-diff-regex='[^ ]+|[ ]' seems to solve the first problem but, to be honest, I'm not actually sure why.
  • git diff --word-diff-regex='[^ ]+|[ ]|^$' -- I was hoping the ^$ on the end would help capture empty lines -- but it doesn't and, worse, it then seems to ignore the change that follows.
  • git diff --word-diff-regex='[^ ]+|[ ]|.{0}' creates same problem as the one before.

I'd be grateful if anyone could shed any light on how to do this, or at least share some knowledge on what's going on under the hood with --word-diff-regex.

like image 618
guypursey Avatar asked Oct 05 '19 14:10

guypursey


1 Answers

The main thing that you're running into that's stopping you from having what you want, from https://git-scm.com/docs/diff-options, is:

A match that contains a newline is silently truncated(!) at the newline.

This is going to mean that word diffs are always going to ignore line diffs. I don't think you're going to achieve the results you want short of a custom diff generator.

like image 67
chaos Avatar answered Nov 17 '22 05:11

chaos