Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I match square bracket in regex with grep?

Tags:

regex

grep

bash

I am trying to match both [ and ] with grep, but only succeeded to match [. No matter how I try, I can't seem to get it right to match ].

Here's a code sample:

echo "fdsl[]" | grep -o "[ a-z]\+" #this prints fdsl
echo "fdsl[]" | grep -o "[ \[a-z]\+" #this prints fdsl[
echo "fdsl[]" | grep -o "[ \]a-z]\+" #this prints nothing
echo "fdsl[]" | grep -o "[ \[\]a-z]\+" #this prints nothing

Edit: My original regex, on which I need to do this, is this one:

echo "fdsl[]" | grep -o "[ \[\]\t\na-zA-Z\/:\.0-9_~\"'+,;*\=()$\!@#&?-]\+" 
#this prints nothing

N.B: I have tried all the answers from this post but that didn't work on this particular case. And I need to use those brackets inside [].

like image 753
Jahid Avatar asked May 05 '15 04:05

Jahid


People also ask

How do you find square brackets in grep?

Another solution is that, if your string is fixed string and it contains brackets. so with the help of grep -F you can make your string fixed and it will be search as it is. cat enb. txt | grep -F '[PHY][I]UE' ** cat enb.

How do you pass square brackets in regex?

You can omit the first backslash. [[\]] will match either bracket. In some regex dialects (e.g. grep) you can omit the backslash before the ] if you place it immediately after the [ (because an empty character class would never be useful): [][] .

What do square brackets mean in grep?

As you can see, a list of possible characters can be placed inside the square brackets. grep -w 'thr[^a-f]*t' Matches the words throughput and thrust. The ^ after the first bracket means to match any character except the characters listed. For example, the word thrift is not matched because it contains an f.

How do you use brackets in regex?

Use square brackets ( [] ) to create a matching list that will match on any one of the characters in the list. Virtually all regular expression metacharacters lose their special meaning and are treated as regular characters when used within square brackets.


3 Answers

According to BRE/ERE Bracketed Expression section of POSIX regex specification:

  1. [...] The right-bracket ( ']' ) shall lose its special meaning and represent itself in a bracket expression if it occurs first in the list (after an initial circumflex ( '^' ), if any). Otherwise, it shall terminate the bracket expression, unless it appears in a collating symbol (such as "[.].]" ) or is the ending right-bracket for a collating symbol, equivalence class, or character class. The special characters '.', '*', '[', and '\' (period, asterisk, left-bracket, and backslash, respectively) shall lose their special meaning within a bracket expression.

and

  1. [...] If a bracket expression specifies both '-' and ']', the ']' shall be placed first (after the '^', if any) and the '-' last within the bracket expression.

Therefore, your regex should be:

echo "fdsl[]" | grep -Eo "[][ a-z]+"

Note the E flag, which specifies to use ERE, which supports + quantifier. + quantifier is not supported in BRE (the default mode).

The solution in Mike Holt's answer "[][a-z ]\+" with escaped + works because it's run on GNU grep, which extends the grammar to support \+ to mean repeat once or more. It's actually undefined behavior according to POSIX standard (which means that the implementation can give meaningful behavior and document it, or throw a syntax error, or whatever).

If you are fine with the assumption that your code can only be run on GNU environment, then it's totally fine to use Mike Holt's answer. Using sed as example, you are stuck with BRE when you use POSIX sed (no flag to switch over to ERE), and it's cumbersome to write even simple regular expression with POSIX BRE, where the only defined quantifier is *.

Original regex

Note that grep consumes the input file line by line, then checks whether the line matches the regex. Therefore, even if you use P flag with your original regex, \n is always redundant, as the regex can't match across lines.

While it is possible to match horizontal tab without P flag, I think it is more natural to use P flag for this task.

Given this input:

$ echo -e "fds\tl[]kSAJD<>?,./:\";'{}|[]\\!@#$%^&*()_+-=~\`89"
fds     l[]kSAJD<>?,./:";'{}|[]\!@#$%^&*()_+-=~`89

The original regex in the question works with little modification (unescape + at the end):

$ echo -e "fds\tl[]kSAJD<>?,./:\";'{}|[]\\!@#$%^&*()_+-=~\`89" | grep -Po "[ \[\]\t\na-zA-Z\/:\.0-9_~\"'+,;*\=()$\!@#&?-]+"
fds     l[]kSAJD
?,./:";'
[]
!@#$
&*()_+-=~
89

Though we can remove \n (since it is redundant, as explained above), and a few other unnecessary escapes:

$ echo -e "fds\tl[]kSAJD<>?,./:\";'{}|[]\\!@#$%^&*()_+-=~\`89" | grep -Po "[ \[\]\ta-zA-Z/:.0-9_~\"'+,;*=()$\!@#&?-]+"
fds     l[]kSAJD
?,./:";'
[]
!@#$
&*()_+-=~
89
like image 104
nhahtdh Avatar answered Oct 01 '22 23:10

nhahtdh


One issue is that [ is a special character in expression and it cannot get escaped with \ (at least not in my flavors of grep). Solution is to define it like [[].

like image 40
skotka Avatar answered Oct 02 '22 01:10

skotka


According to regular-expressions.info:

In most regex flavors, the only special characters or metacharacters inside a character class are the closing bracket (]), the backslash (\), the caret (^), and the hyphen (-). The usual metacharacters are normal characters inside a character class, and do not need to be escaped by a backslash.

... and ...

The closing bracket (]), the caret (^) and the hyphen (-) can be included by escaping them with a backslash, or by placing them in a position where they do not take on their special meaning.

So, assuming that the particular flavor of regular expressions syntax supported by grep conforms to this, then I would have expected that "[ a-z[\]]\+" should have worked.

However, my version of grep (GNU grep 2.14) only matches the "[]" at the end of "fdsl[]" with this regex.

However, I tried using the other technique mentioned in that quote (putting the ] in a position within the character class where it cannot take on its normal meaning, and it seems to have worked:

$ echo "fdsl[]" | grep -o "[][a-z ]\+"
fdsl[]
like image 3
Mike Holt Avatar answered Oct 02 '22 00:10

Mike Holt