Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R: (*SKIP)(*FAIL) for multiple patterns

Tags:

regex

r

boolean

Given test <- c('met','meet','eel','elm'), I need a single line of code that matches any 'e' that is not in 'me' or 'ee'. I wrote (ee|me)(*SKIP)(*F)|e, which does exclude 'met' and 'eel', but not 'meet'. Is this because | is exclusive or? At any rate, is there a solution that just returns 'elm'?

For the record, I know I can also do (?<![me])e(?!e), but I would like to know what the solution is for (*SKIP)(*F) and why my line is wrong.

like image 254
dasf Avatar asked Apr 15 '15 00:04

dasf


1 Answers

This is the correct solution with (*SKIP)(*F):

(?:me+|ee+)(*SKIP)(*FAIL)|e

Demo on regex101, using the following test cases:

met
meet
eel
elm
degree
zookeeper
meee

Only e in elm, first e in degree and last e in zookeeper are matched.

Since e in ee is forbidden, any e in after m is forbidden, and any e in a substring of consecutive e is forbidden. This explains the sub-pattern (?:me+|ee+).

While I am aware that this method is not extensible, it is at least logically correct.

Analysis of other solutions

Solution 0

(ee|me)(*SKIP)(*F)|e

Let's use meet as an example:

meet        # (ee|me)(*SKIP)(*F)|e
^           # ^

meet        # (ee|me)(*SKIP)(*F)|e
  ^         #        ^

meet        # (ee|me)(*SKIP)(*F)|e
  ^         #               ^
            # Forbid backtracking to pattern to the left
            # Set index of bump along advance to current position

meet        # (ee|me)(*SKIP)(*F)|e
  ^         #                  ^
            # Pattern failed. No choice left. Bump along.
            # Note that backtracking to before (*SKIP) is forbidden,
            # so e in second branch is not tried

meet        # (ee|me)(*SKIP)(*F)|e
  ^         # ^
            # Can't match ee or me. Try the other branch

meet        # (ee|me)(*SKIP)(*F)|e
   ^        #                    ^
            # Found a match `e`

The problem is due to the fact that me consumes the first e, so ee fails to match, leaving the second e available for matching.

Solution 1

\w*(ee|me)\w*(*SKIP)(*FAIL)|e

This will just skips all words with ee and me, which means it will fail to match anything in degree and zookeeper.

Demo

Solution 2

(?:ee|mee?)(*SKIP)(?!)|e

Similar problem as solution 0. When there are 3 e in a row, the first 2 e are matched by mee?, leaving the third e available for matching.

Solution 3

(?:^.*[me]e)(*SKIP)(*FAIL)|e

This throws away the input up to the last me or ee, which means that any valid e before the last me or ee will not be matched, like first e in degree.

Demo

like image 54
nhahtdh Avatar answered Sep 28 '22 02:09

nhahtdh