Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using escape characters inside grep

Tags:

regex

escaping

I have the following regular expression for eliminating spaces, tabs, and new lines: [^ \n\t]

However, I want to expand this for certain additional characters, such as > and <.

I tried [^ \n\t<>], which works well for now, but I want the expression to not match if the < or > is preceded by a \.

I tried [^ \n\t[^\\]<[^\\]>], but this did not work.

like image 764
samoz Avatar asked Mar 25 '09 18:03

samoz


1 Answers

Can any one of the sequences below occur in your input?

\\>
\\\>
\\\\>
\blank
\tab
\newline
...

If so, how do you propose to treat them?

If not, then zero-width look-behind assertions will do the trick, provided that your regular expression engine supports it. This will be the case in any engine that supports Perl-style regular expressions (including Perl's, PHP, etc.):

 (?<!\\)[ \n\t<>]

The above will match any un-escaped space, newline, tab or angled braces. More generically (using \s to denote any space characters, including \r):

 (?<!\\)\s

Alternatively, using complementary notation without the need for a zero-width look-behind assertion (but arguably less efficiently):

 (?:[^ \n\t<>]|\\[<>])

You may also use a variation of the latter to handle the \\>, \\\>, \\\\> etc. cases as well up to some finite number of preceding backslashes, such as:

 (?:[^ \n\t<>]|(?:^|[^<>])[\\]{1,3,5,7,9}[<>])
like image 85
vladr Avatar answered Nov 10 '22 04:11

vladr