I am having problems with the nested '+'/'-' lookahead/lookbehind in regex.
Let's say that I want to change the '*'
in a string with '%'
and let's say that '\'
escapes the next character. (Turning a regex to sql like command ^^).
So the string
'*test*'
should be changed to '%test%'
, '\\*test\\*'
-> '\\%test\\%'
, but '\*test\*'
and '\\\*test\\\*'
should stay the same.I tried:
(?<!\\)(?=\\\\)*\* but this doesn't work
(?<!\\)((?=\\\\)*\*) ...
(?<!\\(?=\\\\)*)\* ...
(?=(?<!\\)(?=\\\\)*)\* ...
What is the correct regex that will match the '*'s in examples given above?
What is the difference between (?<!\\(?=\\\\)*)\*
and (?=(?<!\\)(?=\\\\)*)\*
or if these are essentially wrong the difference between regex that have such a visual construction?
Lookahead allows to add a condition for “what follows”. Lookbehind is similar, but it looks behind. That is, it allows to match a pattern only if there's something before it.
Lookbehind, which is used to match a phrase that is preceded by a user specified text. Positive lookbehind is syntaxed like (? <=a)something which can be used along with any regex parameter. The above phrase matches any "something" word that is preceded by an "a" word.
In negative lookbehind the regex engine first finds a match for an item after that it traces back and tries to match a given item which is just before the main match. In case of a successful traceback match the match is a failure, otherwise it is a success.
The positive lookahead construct is a pair of parentheses, with the opening parenthesis followed by a question mark and an equals sign. You can use any regular expression inside the lookahead (but not lookbehind, as explained below). Any valid regular expression can be used inside the lookahead.
So you essentially want to match *
only if it's preceded by an even number of backslashes (or, in other words, if it isn't escaped)? Then you don't need lookahead at all since you're only looking back, aren't you?
Search for
(?<=(?<!\\)(?:\\\\)*)\*
and replace with %
.
Explanation:
(?<= # Assert that it's possible to match before the current position...
(?<!\\) # (unless there are more backslashes before that)
(?:\\\\)* # an even number of backslashes
) # End of lookbehind
\* # Then match an asterisk
To find an unescaped character, you would look for a character that is preceded by an even number of (or zero) escape characters. This is relatively straight-forward.
(?<=(?<!\\)(?:\\\\)*)\* # this is explained in Tim Pietzcker' answer
Unfortunately, many regex engines do not support variable-length look-behind, so we have to substitute with look-ahead:
(?=(?<!\\)(?:\\\\)*\*)(\\*)\* # also look at ridgerunner's improved version
Replace this with the contents of group 1 and a %
sign.
Explanation
(?= # start look-ahead
(?<!\\) # a position not preceded by a backslash (via look-behind)
(?:\\\\)* # an even number of backslashes (don't capture them)
\* # a star
) # end look-ahead. If found,
( # start group 1
\\* # match any number of backslashes in front of the star
) # end group 1
\* # match the star itself
The look-ahead makes sure only even numbers of backslashes are taken into account. Anyway, there is no way around matching them into a group, since the look-ahead does not advance the position in the string.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With