Okay, so I'm trying to use a regular expression to match instances of a character only if it hasn't been escaped (with a backslash) and decided to use the a negative look-behind like so:
(?<!\\)[*]
This succeeds and fails as expected with strings such as foo*
and foo\*
respectively.
However, it doesn't work for strings such as foo\\*
, i.e - where the special character is preceded by a back-slash escaping another back-slash (an escape sequence that is itself escaped).
Is it possible to use a negative look-behind (or some other technique) to skip special characters only if they are preceded by an odd number of back-slashes?
To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash ( \ ). E.g., \. matches "." ; regex \+ matches "+" ; and regex \( matches "(" .
In regular expressions, we can match any character using period "." character. To match multiple characters or a given set of characters, we should use character classes.
It indicates that the subpattern is a non-capture subpattern. That means whatever is matched in (?:\w+\s) , even though it's enclosed by () it won't appear in the list of matches, only (\w+) will.
The regular expression [A-Z][a-z]* matches any sequence of letters that starts with an uppercase letter and is followed by zero or more lowercase letters.
I've found the following solution which works for NSRegularExpression
but also works in every regexp implementation I've tried that supports negative look-behinds:
(?<!\\)(?:(\\\\)*)[*]
In this case the second unmatched parenthesis matches any pairs of back-slashes, effectively eliminating them, at which point the negative look-behind can compare any remaining (odd numbered) back-slashes as expected.
A lookbehind can not solve this problem. The only way is to match escaped characters first to avoid them and to find unescaped characters:
you can isolate the unescaped character from the result with a capture group:
(?:\\.)+|(\*)
or with the \K
(pcre/perl/ruby) feature that removes all on the left from the result:
(?:\\.)*\K\*
or using backtracking control verbs (pcre/perl) to skip escaped characters:
(?:\\.)+(*SKIP)(*FAIL)|\*
The only case you can use a lookbehind is with the .net framework that allows unlimited length lookbehind:
(?<!(?:[^\\]|\A)(?:\\\\)*\\)\*
or in a more limited way with java:
(?<!(?:[^\\]|\A)(?:\\\\){0,1000}\\)\*
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With