I have the following regular expression for eliminating spaces, tabs, and new lines: [^ \n\t]
However, I want to expand this for certain additional characters, such as >
and <
.
I tried [^ \n\t<>]
, which works well for now, but I want the expression to not match if the <
or >
is preceded by a \
.
I tried [^ \n\t[^\\]<[^\\]>]
, but this did not work.
Can any one of the sequences below occur in your input?
\\>
\\\>
\\\\>
\
blank
\
tab
\
newline
...
If so, how do you propose to treat them?
If not, then zero-width look-behind assertions will do the trick, provided that your regular expression engine supports it. This will be the case in any engine that supports Perl-style regular expressions (including Perl's, PHP, etc.):
(?<!\\)[ \n\t<>]
The above will match any un-escaped space, newline, tab or angled braces. More generically (using \s
to denote any space characters, including \r
):
(?<!\\)\s
Alternatively, using complementary notation without the need for a zero-width look-behind assertion (but arguably less efficiently):
(?:[^ \n\t<>]|\\[<>])
You may also use a variation of the latter to handle the \\>
, \\\>
, \\\\>
etc. cases as well up to some finite number of preceding backslashes, such as:
(?:[^ \n\t<>]|(?:^|[^<>])[\\]{1,3,5,7,9}[<>])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With