Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do you use a plus symbol with a character class as part of a regular expression?

in cygwin, this does not return a match:

$ echo "aaab" | grep '^[ab]+$'

But this does return a match:

$ echo "aaab" | grep '^[ab][ab]*$'
aaab

Are the two expressions not identical? Is there any way to express "one or more characters of the character class" without typing the character class twice (like in the seconds example)?

According to this link the two expressions should be the same, but perhaps Regular-Expressions.info does not cover bash in cygwin.

like image 756
Charles Holbrow Avatar asked Dec 07 '22 22:12

Charles Holbrow


2 Answers

grep has multiple "modes" of matching, and by default only uses a basic set, which does not recognize a number of metacharacters unless they're escaped. You can put grep into extended or perl modes to let + be evaluated.

From man grep:

Matcher Selection
  -E, --extended-regexp
     Interpret PATTERN as an extended regular expression (ERE, see below).  (-E is specified by POSIX.)

  -P, --perl-regexp
     Interpret PATTERN as a Perl regular expression.  This is highly experimental and grep -P may warn of unimplemented features.


Basic vs Extended Regular Expressions
  In basic regular expressions the meta-characters ?, +, {, |, (, and ) lose their special meaning; instead use the backslashed versions \?, \+, \{, \|, \(, and \).

  Traditional egrep did not support the { meta-character, and some egrep implementations support \{ instead, so portable scripts should avoid { in grep -E patterns and should use [{] to match a literal {.

  GNU  grep -E attempts to support traditional usage by assuming that { is not special if it would be the start of an invalid interval specification.  For example, the command grep -E '{1' searches for the two-character string {1 instead of reporting a syntax
       error in the regular expression.  POSIX.2 allows this behavior as an extension, but portable scripts should avoid it.

Alternately, you can use egrep instead of grep -E.

like image 187
Daniel Vandersluis Avatar answered Dec 09 '22 12:12

Daniel Vandersluis


In basic regular expressions the metacharacters ?, +, {, |, (, and ) lose their special meaning; instead use the backslashed versions \?, \+, \{, \|, \(, and \).

So use the backslashed version:

$ echo aaab | grep '^[ab]\+$'
aaab

Or activate extended syntax:

$ echo aaab | egrep '^[ab]+$'
aaab
like image 36
Josh Lee Avatar answered Dec 09 '22 12:12

Josh Lee