I have the following regular expression for eliminating spaces, tabs, and new lines: [^ \n\t]
However, I want to expand this for certain additional characters, such as > and <.
I tried [^ \n\t<>], which works well for now, but I want the expression to not match if the < or > is preceded by a \.
I tried [^ \n\t[^\\]<[^\\]>], but this did not work.
Can any one of the sequences below occur in your input?
\\>\\\>\\\\>\blank\tab\newline
...
If so, how do you propose to treat them?
If not, then zero-width look-behind assertions will do the trick, provided that your regular expression engine supports it. This will be the case in any engine that supports Perl-style regular expressions (including Perl's, PHP, etc.):
(?<!\\)[ \n\t<>]
The above will match any un-escaped space, newline, tab or angled braces. More generically (using \s to denote any space characters, including \r):
(?<!\\)\s
Alternatively, using complementary notation without the need for a zero-width look-behind assertion (but arguably less efficiently):
(?:[^ \n\t<>]|\\[<>])
You may also use a variation of the latter to handle the \\>, \\\>, \\\\> etc. cases as well up to some finite number of preceding backslashes, such as:
(?:[^ \n\t<>]|(?:^|[^<>])[\\]{1,3,5,7,9}[<>])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With