Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regular Expression to Match Unescaped Characters Only

Tags:

regex

swift

Okay, so I'm trying to use a regular expression to match instances of a character only if it hasn't been escaped (with a backslash) and decided to use the a negative look-behind like so:

(?<!\\)[*]

This succeeds and fails as expected with strings such as foo* and foo\* respectively.

However, it doesn't work for strings such as foo\\*, i.e - where the special character is preceded by a back-slash escaping another back-slash (an escape sequence that is itself escaped).

Is it possible to use a negative look-behind (or some other technique) to skip special characters only if they are preceded by an odd number of back-slashes?

like image 971
Haravikk Avatar asked Jan 23 '15 16:01

Haravikk


People also ask

How do you pass special characters in regex?

To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash ( \ ). E.g., \. matches "." ; regex \+ matches "+" ; and regex \( matches "(" .

How do I match a character in regex?

In regular expressions, we can match any character using period "." character. To match multiple characters or a given set of characters, we should use character classes.

What is ?: In regex?

It indicates that the subpattern is a non-capture subpattern. That means whatever is matched in (?:\w+\s) , even though it's enclosed by () it won't appear in the list of matches, only (\w+) will.

What is regex AZ match?

The regular expression [A-Z][a-z]* matches any sequence of letters that starts with an uppercase letter and is followed by zero or more lowercase letters.


2 Answers

I've found the following solution which works for NSRegularExpression but also works in every regexp implementation I've tried that supports negative look-behinds:

(?<!\\)(?:(\\\\)*)[*]

In this case the second unmatched parenthesis matches any pairs of back-slashes, effectively eliminating them, at which point the negative look-behind can compare any remaining (odd numbered) back-slashes as expected.

like image 180
Haravikk Avatar answered Oct 11 '22 06:10

Haravikk


A lookbehind can not solve this problem. The only way is to match escaped characters first to avoid them and to find unescaped characters:

you can isolate the unescaped character from the result with a capture group:

(?:\\.)+|(\*)

or with the \K (pcre/perl/ruby) feature that removes all on the left from the result:

(?:\\.)*\K\*

or using backtracking control verbs (pcre/perl) to skip escaped characters:

(?:\\.)+(*SKIP)(*FAIL)|\*

The only case you can use a lookbehind is with the .net framework that allows unlimited length lookbehind:

(?<!(?:[^\\]|\A)(?:\\\\)*\\)\*

or in a more limited way with java:

(?<!(?:[^\\]|\A)(?:\\\\){0,1000}\\)\*
like image 33
Casimir et Hippolyte Avatar answered Oct 11 '22 07:10

Casimir et Hippolyte