Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Example of "use \G in negative variable-length lookbehinds to limit how far back the lookbehind goes"

Tags:

python

regex

In the pypi page of the awesome regex module (https://pypi.python.org/pypi/regex) it is stated that \G can be used "in negative variable-length lookbehinds to limit how far back the lookbehind goes". Very interesting, but the page doesn't give any example and my white-belt regex-fu simply chokes when I try to imagine one.

Could anyone describe some sample use case?

like image 496
Ron Avatar asked Dec 19 '14 09:12

Ron


1 Answers

Here's an example that uses \G and a negative lookbehind creatively:

regex.match(r'\b\w+\b(?:\s(\w+\b)(?<!\G.*\b\1\b.*\b\1\b))*', words)

words should be a string of alphanumeric characters separated by a single whitespace, for example "a b c d e a b b c d".

The pattern will match a sequence of unique words.

  • \w+ - Match the first word.
  • (?:\s(\w+\b) )* - match additional words ...
  • (?<!\G.*\b\1\b.*\b\1\b) - ... but for each new word added, check it didn't already appear until we get to \G.

A lookbehind at the end of the pattern that is limited at \G can assert another condition on the current match, which would not have been possible otherwise. Basically, the pattern is a variation on using lookaheads for AND logic in regular expressions, but is not limited to the whole string.

Here's a working example in .Net, which shares the same features.
Trying the same pattern in Python 2 with findall and the regex module gives me a segmentation fault, but match seems to work.

like image 159
Kobi Avatar answered Oct 10 '22 00:10

Kobi