I have a data.frame named all
that has a column of factors, these factors include "word"
,"nonword"
and some others. My goal is to select only the rows that have the factor value "word".
My solution grep("\bword\b",all[,5])
returns nothing.
How come word boundaries are not recognized?
Word Boundary: \b The word boundary \b matches positions where one side is a word character (usually a letter, digit or underscore—but see below for variations across engines) and the other side is not a word character (for instance, it may be the beginning of the string or a space character).
The metacharacter \b is an anchor like the caret and the dollar sign. It matches at a position that is called a “word boundary”. This match is zero-length.
Details. A 'regular expression' is a pattern that describes a set of strings. Two types of regular expressions are used in R, extended regular expressions (the default) and Perl-like regular expressions used by perl = TRUE . There is also fixed = TRUE which can be considered to use a literal regular expression.
\b. Matches the empty string, but only at the beginning or end of a word. A word is defined as a sequence of word characters. Note that formally, \b is defined as the boundary between a \w and a \W character (or vice versa), or between \w and the beginning/end of the string.
In R, you need two times \
:
grep("\\bword\\b", all[5])
Alternative solutions:
grep("^word$", all[5])
which(all[5] == "word")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With