Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Nested regex lookahead and lookbehind

I am having problems with the nested '+'/'-' lookahead/lookbehind in regex.

Let's say that I want to change the '*' in a string with '%' and let's say that '\' escapes the next character. (Turning a regex to sql like command ^^).

So the string

  • '*test*' should be changed to '%test%',
  • '\\*test\\*' -> '\\%test\\%', but
  • '\*test\*' and '\\\*test\\\*' should stay the same.

I tried:

(?<!\\)(?=\\\\)*\*      but this doesn't work
(?<!\\)((?=\\\\)*\*)    ...
(?<!\\(?=\\\\)*)\*      ...
(?=(?<!\\)(?=\\\\)*)\*  ...

What is the correct regex that will match the '*'s in examples given above?

What is the difference between (?<!\\(?=\\\\)*)\* and (?=(?<!\\)(?=\\\\)*)\* or if these are essentially wrong the difference between regex that have such a visual construction?

like image 704
bliof Avatar asked Oct 23 '11 15:10

bliof


People also ask

What is lookahead and Lookbehind in regex?

Lookahead allows to add a condition for “what follows”. Lookbehind is similar, but it looks behind. That is, it allows to match a pattern only if there's something before it.

What is a Lookbehind?

Lookbehind, which is used to match a phrase that is preceded by a user specified text. Positive lookbehind is syntaxed like (? <=a)something which can be used along with any regex parameter. The above phrase matches any "something" word that is preceded by an "a" word.

What is negative Lookbehind regex?

In negative lookbehind the regex engine first finds a match for an item after that it traces back and tries to match a given item which is just before the main match. In case of a successful traceback match the match is a failure, otherwise it is a success.

What is regex positive lookahead?

The positive lookahead construct is a pair of parentheses, with the opening parenthesis followed by a question mark and an equals sign. You can use any regular expression inside the lookahead (but not lookbehind, as explained below). Any valid regular expression can be used inside the lookahead.


2 Answers

So you essentially want to match * only if it's preceded by an even number of backslashes (or, in other words, if it isn't escaped)? Then you don't need lookahead at all since you're only looking back, aren't you?

Search for

(?<=(?<!\\)(?:\\\\)*)\*

and replace with %.

Explanation:

(?<=       # Assert that it's possible to match before the current position...
 (?<!\\)   # (unless there are more backslashes before that)
 (?:\\\\)* # an even number of backslashes
)          # End of lookbehind
\*         # Then match an asterisk
like image 63
Tim Pietzcker Avatar answered Sep 23 '22 01:09

Tim Pietzcker


To find an unescaped character, you would look for a character that is preceded by an even number of (or zero) escape characters. This is relatively straight-forward.

(?<=(?<!\\)(?:\\\\)*)\*        # this is explained in Tim Pietzcker' answer

Unfortunately, many regex engines do not support variable-length look-behind, so we have to substitute with look-ahead:

(?=(?<!\\)(?:\\\\)*\*)(\\*)\*  # also look at ridgerunner's improved version

Replace this with the contents of group 1 and a % sign.

Explanation

(?=           # start look-ahead
  (?<!\\)     #   a position not preceded by a backslash (via look-behind)
  (?:\\\\)*   #   an even number of backslashes (don't capture them)
  \*          #   a star
)             # end look-ahead. If found,
(             # start group 1
  \\*         #   match any number of backslashes in front of the star
)             # end group 1
\*            # match the star itself

The look-ahead makes sure only even numbers of backslashes are taken into account. Anyway, there is no way around matching them into a group, since the look-ahead does not advance the position in the string.

like image 41
Tomalak Avatar answered Sep 25 '22 01:09

Tomalak