I'm trying to use the git diff --word-diff-regex= command and it seems to reject any types of lookaheads and lookbehinds. I'm having trouble pinning down what flavor of regex git uses. For example
git diff --word-diff-regex='([.\w]+)(?!>)'
Comes back as an invalid regular expression.
I am trying to get all the words that are not HTML tags. So the resulting matches of the regex should be 'Hello' 'World' 'Foo' 'Bar' for the below string
<p> Hello World </p><p> Foo Bar </p>
The term "flavor" refers to the regex engine – the syntax and additional properties supported by the particular regex engine. The Pattern class documents the properties of the Java regex engine.
Python: The regex flavor supported by Python's built-in re module. Ruby: The regex flavor built into the Ruby programming language.
As a result, broadly speaking, there are three types of regex engines: DFA (POSIX or not—similar either way) Traditional NFA (most common: Perl, . NET, PHP, Java, Python, . . . )
By default R uses POSIX extended regular expressions, though if extended is set to FALSE , it will use basic POSIX regular expressions. If perl is set to TRUE , R will use the Perl 5 flavor of regular expressions as implemented in the PCRE library.
The Git source uses regcomp
and regexec
, which are defined by POSIX 1003.2. The code to compile a diff regexp is:
if (regcomp(ecbdata->diff_words->word_regex,
o->word_regex,
REG_EXTENDED | REG_NEWLINE))
which in POSIX means that these are "extended" regular expressions as defined here.
(Not every C library actually implements the same POSIX REG_EXTENDED
. Git includes its own implementation, which can be built in place of the system's.)
Edit (per updated question): POSIX EREs have neither lookahead nor lookbehind, nor do they have \w
(but [_[:alnum:]]
is probably close enough for most purposes).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With