Currently prepping for RHCSA and learning regex. What's the difference between \b and \< ?
They seem to do almost the exact same thing: Match the string in between the backslashes.
Example:
[root@RHEL8DEV etc]# grep '\<root\>' * 2>/dev/null | wc
105 327 3658
[root@RHEL8DEV etc]# grep '\broot\b' * 2>/dev/null | wc
105 327 3658
Even after reading on gnu.org, I'm still scratching my head.
\b\b matches the empty string, but only at the beginning or end of a word. Thus, \bfoo\b matches any occurrence of foo as a separate word. \bballs?\b matches ball or balls as a separate word. \b matches at the beginning or end of the buffer regardless of what
text appears next to it.\< and \>\< atches the empty string, but only at the beginning of a
word. \< matches at the beginning of the buffer only if a
word-constituent character follows.\> matches the empty string, but only at the end of a word. \> matches at the end of the buffer only if the contents end with a word-constituent character.Thanks for taking time to read this.
Only the manual page for your specific version of grep can reveal whether they are exactly equivalent. Neither is fully portable.
Traditionally, \< would only match at a lef word boundary, and \> at a right one, in some versions of egrep. (However, e.g. Procmail took a shortcut, and actually defines both identically.)
\b is a newer construct from Perl et al., and is direction neutral, i.e. it is true at a word boundary either on the left or on the right of a sequence of word characters.
I've personally found \b to be more broadly supported than \< and \>. The only exceptions I've encountered is that vim and BSD sed support \< and \> without \b.
As to their definitions: in PCRE, it's essentially
\< = (?<!\w)(?=\w) = word character on the right but not on the left\> = (?<=\w)(?!\w) = word character on the left but not on the right\b = (?:(?<!\w)(?=\w)|(?<=\w)(?!\w)) = either of the aboveThose links point to Regex101's explanations of these regexes. Note that none of that site's four supported engines understand what \< and \> are supposed to do.
Since PCRE explicitly prohibits special meanings to non-alphanumeric escapes, \< means "literal open angle-bracket" and therefore (?:\<|\>) means [<>] rather than \b. Standard Extended Regular Expressions do not have this explicit prohibition, though they also do not implement any such special meanings (items like \< and \> are non-standard extensions).
Also note that inside a character class, things differ. In most regex interpreters, [\b] means "literal backspace character" and is equivalent to [\010] or [\x08] (or \010 or \x08). Putting a zero-width item into a character class doesn't make any sense anyway.
An example of the differences, using GNU grep, which accepts both formats:
$ echo yes |grep '\<yes'
yes
$ echo yes |grep '\byes'
yes
$ echo yes |grep '\>yes'
# (no output here means it failed)
$
Here you can see that the directionality matters for \< and \> but not for \b
Various support tests, command-line only (Debian Testing as of 2019/11/25 or FreeBSD 11.2 as noted):
$ echo y |grep '\<y' # GNU grep w/ BRE, Basic Regular Expression
y
$ echo y |grep -E '\<y' # GNU grep w/ ERE, Extended Regular Expression
y
$ echo y |grep -P '\<y' # GNU grep w/ libpcre, Perl-Compatible Regular Expression
$ echo y |perl -ne 'print if /\<y/' # perl proper
$ echo y |sed '/\<y/!d' # GNU sed with BRE
y
$ echo y |sed -r '/\<y/!d' # GNU sed with ERE
y
$ echo y |sed '/\<y/!d' # BSD sed with BRE (FreeBSD 11.2)
y
$ echo y |sed -E '/\<y/!d' # BSD sed with ERE (FreeBSD 11.2)
y
$ echo y |gawk '/\<y/' # GNU awk
y
$ echo y |mawk '/\<y/' # More POSIX-aligned
$
# python test (result printed as an array, in this case empty for no matches)
$ echo y |python -c 'import re,sys; print re.findall(r"\<y", sys.stdin.read())'
[]
grep -P (which uses libpcre, not always compiled into grep) does not match because PCRE doesn't recognize \< as anything but a literal < character.
$ echo y |grep '\by' # GNU grep w/ BRE, Basic regex
y
$ echo y |grep -E '\by' # GNU grep w/ ERE, Extended regex
y
$ echo y |grep -P '\by' # GNU grep w/ libpcre, Perl-compatible regex
y
$ echo y |perl -ne 'print if /\by/' # perl proper
y
$ echo y |sed '/\by/!d' # GNU sed with BRE
y
$ echo y |sed -r '/\by/!d' # GNU sed with ERE
y
$ echo y |sed '/\by/!d' # BSD sed with BRE (FreeBSD 11.2)
$ echo y |sed -E '/\by/!d' # BSD sed with ERE (FreeBSD 11.2)
$ echo y |gawk '/\by/' # GNU awk
$ echo y |mawk '/\by/' # POSIX-ish awk
$
# python test
$ echo y |python -c 'import re,sys; print re.findall(r"\by", sys.stdin.read())'
['y']
Note how BSD sed accepts \< but not \b yet GNU sed accepts both.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With