Non greedy matching as far as I know is not part of Basic Regular Expression (BRE) and Extended Regular Expression (ERE). However, the behaviour on different versions of grep
(BSD and GNU) seems to suggest other wise.
For example, let's take the following example. I have a string say:
string="hello_my_dear_polo"
grep
:Following are few attempts to extract hello
from the string.
BRE Attempt (fails):
$ grep -o "hel.*\?o" <<< "$string"
hello_my_dear_polo
Output yields entire string which suggest the non-greedy quantifier does not work on BRE. Note that I have only escaped ?
since *
does not lose it's meaning and need not be escaped.
ERE Attempt (fails):
$ grep -oE "hel.*?o" <<< "$string"
hello_my_dear_polo
Enabling the -E
option also yields the same output suggesting that non-greedy matching is not part of ERE. Escaping was not needed here since we are using ERE.
PCRE Attempt (succeeds):
$ grep -oP "hel.*?o" <<< "$string"
hello
Enabling the -P
option for PCRE suggests that non-greedy quantifier is a part of it and hence we get the desired output of hello
. Escaping was not needed here since we are using PCRE.
grep
:Here are few attempts to extract hello
from the string.
BRE Attempt (fails):
$ grep -o "hel.*\?o" <<< "$string"
Using BRE I get no output from BSD grep
.
ERE Attempt (succeeds):
$ grep -oE "hel.*?o" <<< "$string"
hello
After enabling the -E
option, I am surprised that I was able to extract my desired output. My question is on the output I am getting from this attempt.
PCRE Attempt (fails):
$ grep -oP "hel.*?o" <<< "$string"
usage: grep [-abcDEFGHhIiJLlmnOoPqRSsUVvwxZ] [-A num] [-B num] [-C[num]]
[-e pattern] [-f file] [--binary-files=value] [--color=when]
[--context[=num]] [--directories=action] [--label] [--line-buffered]
[--null] [pattern] [file ...]
Using -P
option gave me usage error which was expected since BSD option of grep
does not support PCRE.
So my question is why would using ERE on BSD grep
yield correct output with using non-greedy quantifier but not with GNU grep
.
Is this a bug, an un-documented feature of BSD egrep
or my mis-understanding of the output?
The double quantifier is simply a syntax error and could result in either an error message or undefined behavior. It would arguably be better if you got an error message.
Perl extensions to regex post-date POSIX by a large margin; at the time these tools were written, it was extremely unlikely that someone would try to use this wacky syntax for anything. Greedy matching was only introduced in Perl 5, in the mid-1990s.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With