Currently prepping for RHCSA and learning regex. What's the difference between \b
and \<
?
They seem to do almost the exact same thing: Match the string in between the backslashes.
Example:
[root@RHEL8DEV etc]# grep '\<root\>' * 2>/dev/null | wc
105 327 3658
[root@RHEL8DEV etc]# grep '\broot\b' * 2>/dev/null | wc
105 327 3658
Even after reading on gnu.org, I'm still scratching my head.
\b
\b
matches the empty string, but only at the beginning or end of a word. Thus, \bfoo\b
matches any occurrence of foo
as a separate word. \bballs?\b
matches ball
or balls
as a separate word. \b
matches at the beginning or end of the buffer regardless of what
text appears next to it.\<
and \>
\<
atches the empty string, but only at the beginning of a
word. \<
matches at the beginning of the buffer only if a
word-constituent character follows.\>
matches the empty string, but only at the end of a word. \>
matches at the end of the buffer only if the contents end with a word-constituent character.Thanks for taking time to read this.
Only the manual page for your specific version of grep
can reveal whether they are exactly equivalent. Neither is fully portable.
Traditionally, \<
would only match at a lef word boundary, and \>
at a right one, in some versions of egrep
. (However, e.g. Procmail took a shortcut, and actually defines both identically.)
\b
is a newer construct from Perl et al., and is direction neutral, i.e. it is true at a word boundary either on the left or on the right of a sequence of word characters.
I've personally found \b
to be more broadly supported than \<
and \>
. The only exceptions I've encountered is that vim and BSD sed support \<
and \>
without \b
.
As to their definitions: in PCRE, it's essentially
\<
= (?<!\w)(?=\w)
= word character on the right but not on the left\>
= (?<=\w)(?!\w)
= word character on the left but not on the right\b
= (?:(?<!\w)(?=\w)|(?<=\w)(?!\w))
= either of the aboveThose links point to Regex101's explanations of these regexes. Note that none of that site's four supported engines understand what \<
and \>
are supposed to do.
Since PCRE explicitly prohibits special meanings to non-alphanumeric escapes, \<
means "literal open angle-bracket" and therefore (?:\<|\>)
means [<>]
rather than \b
. Standard Extended Regular Expressions do not have this explicit prohibition, though they also do not implement any such special meanings (items like \<
and \>
are non-standard extensions).
Also note that inside a character class, things differ. In most regex interpreters, [\b]
means "literal backspace character" and is equivalent to [\010]
or [\x08]
(or \010
or \x08
). Putting a zero-width item into a character class doesn't make any sense anyway.
An example of the differences, using GNU grep, which accepts both formats:
$ echo yes |grep '\<yes'
yes
$ echo yes |grep '\byes'
yes
$ echo yes |grep '\>yes'
# (no output here means it failed)
$
Here you can see that the directionality matters for \<
and \>
but not for \b
Various support tests, command-line only (Debian Testing as of 2019/11/25 or FreeBSD 11.2 as noted):
$ echo y |grep '\<y' # GNU grep w/ BRE, Basic Regular Expression
y
$ echo y |grep -E '\<y' # GNU grep w/ ERE, Extended Regular Expression
y
$ echo y |grep -P '\<y' # GNU grep w/ libpcre, Perl-Compatible Regular Expression
$ echo y |perl -ne 'print if /\<y/' # perl proper
$ echo y |sed '/\<y/!d' # GNU sed with BRE
y
$ echo y |sed -r '/\<y/!d' # GNU sed with ERE
y
$ echo y |sed '/\<y/!d' # BSD sed with BRE (FreeBSD 11.2)
y
$ echo y |sed -E '/\<y/!d' # BSD sed with ERE (FreeBSD 11.2)
y
$ echo y |gawk '/\<y/' # GNU awk
y
$ echo y |mawk '/\<y/' # More POSIX-aligned
$
# python test (result printed as an array, in this case empty for no matches)
$ echo y |python -c 'import re,sys; print re.findall(r"\<y", sys.stdin.read())'
[]
grep -P
(which uses libpcre, not always compiled into grep) does not match because PCRE doesn't recognize \<
as anything but a literal <
character.
$ echo y |grep '\by' # GNU grep w/ BRE, Basic regex
y
$ echo y |grep -E '\by' # GNU grep w/ ERE, Extended regex
y
$ echo y |grep -P '\by' # GNU grep w/ libpcre, Perl-compatible regex
y
$ echo y |perl -ne 'print if /\by/' # perl proper
y
$ echo y |sed '/\by/!d' # GNU sed with BRE
y
$ echo y |sed -r '/\by/!d' # GNU sed with ERE
y
$ echo y |sed '/\by/!d' # BSD sed with BRE (FreeBSD 11.2)
$ echo y |sed -E '/\by/!d' # BSD sed with ERE (FreeBSD 11.2)
$ echo y |gawk '/\by/' # GNU awk
$ echo y |mawk '/\by/' # POSIX-ish awk
$
# python test
$ echo y |python -c 'import re,sys; print re.findall(r"\by", sys.stdin.read())'
['y']
Note how BSD sed
accepts \<
but not \b
yet GNU sed
accepts both.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With