Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Grep and regex - why am I escaping curly braces?

I'm deeply puzzled by the way grep seems to parse a regex:

$ echo "@NS500287" | grep '^@NS500[0-9]{3}'
#nothing
$ echo "@NS500287" | grep '^@NS500[0-9]\{3\}'
@NS500287

That can't be right. Why am I escaping curly brackets that are part of a "match the previous, N times" component (and not, say, the square brackets as well)?

Shouldn't escaping be necessary only when I'm writing a regex that actually matches { and } as literal characters in the query string?

More of a cri de coeur than anything else, but I'm curious about the answer.

like image 782
Justin St. Giles Payne Avatar asked Nov 06 '14 15:11

Justin St. Giles Payne


People also ask

Do curly braces need to be escaped in regex?

To match literal curly braces, you have to escape them with \ . However, Apex Code uses \ as an escape, too, so you have to "escape the escape". You'll need to do this almost every time you want to use any sort of special characters in your regexp literally, which will happen more frequently than not.

Does grep work with regex?

Three types of regexThe grep understands three different types of regular expression syntax as follows: basic (BRE) extended (ERE) perl (PCRE)

What do curly braces do in regex?

The curly brackets are used to match exactly n instances of the proceeding character or pattern. For example, "/x{2}/" matches "xx".

What regex flavor does grep use?

Grep is an implementation of POSIX regular expressions. There are two types of posix regular expressions -- basic regular expressions and extended regular expressions. In grep, generally you use the -E option to allow extended regular expressions. Save this answer.


2 Answers

This is because {} are special characters and they need to handled differently to have this special behaviour. Otherwise, they will be treated as literal { and }.

You can either escape like you did:

$ echo "@NS500287" | grep '^@NS500[0-9]\{3\}'
@NS500287

or use grep -E:

$ echo "@NS500287" | grep -E '^@NS500[0-9]{3}'
@NS500287

Without any processing:

$ echo "he{llo" | grep "{"
he{llo

From man grep:

-E, --extended-regexp

Interpret PATTERN as an extended regular expression (ERE, see below). (-E is specified by POSIX.)

...

REGULAR EXPRESSIONS

A regular expression is a pattern that describes a set of strings. Regular expressions are constructed analogously to arithmetic expressions, by using various operators to combine smaller expressions.

grep understands three different versions of regular expression syntax: “basic,” “extended” and “perl.” In GNU grep, there is no difference in available functionality between basic and extended syntaxes. In other implementations, basic regular expressions are less powerful. The following description applies to extended regular expressions; differences for basic regular expressions are summarized afterwards. Perl regular expressions give additional functionality, and are documented in pcresyntax(3) and pcrepattern(3), but may not be available on every system.

...

Basic vs Extended Regular Expressions

In basic regular expressions the meta-characters ?, +, {, |, (, and ) lose their special meaning; instead use the backslashed versions \?, \+, \{, \|, \(, and \).

like image 147
fedorqui 'SO stop harming' Avatar answered Oct 09 '22 04:10

fedorqui 'SO stop harming'


The answer relates to the difference between Basic Regular Expressions (BREs) and Extended ones (EREs).

  • In BRE mode (i.e. when you call grep with no argument to specify otherwise), the { and } are interpreted as literal characters. Escaping them with \ means that they are to be interpreted as a number of instances of the previous pattern.

  • If you were to use grep -E instead (ERE mode), you would be able to use { and } without escaping to refer to the count. In ERE mode, escaping the braces causes them to be interpreted literally instead.

like image 25
Tom Fenech Avatar answered Oct 09 '22 03:10

Tom Fenech