Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

detecting word boundary with regex in data frame in R

I have a data.frame named all that has a column of factors, these factors include "word","nonword" and some others. My goal is to select only the rows that have the factor value "word".

My solution grep("\bword\b",all[,5]) returns nothing.

How come word boundaries are not recognized?

like image 573
Daniel Kislyuk Avatar asked Jul 28 '13 07:07

Daniel Kislyuk


People also ask

How does word boundary work in regex?

Word Boundary: \b The word boundary \b matches positions where one side is a word character (usually a letter, digit or underscore—but see below for variations across engines) and the other side is not a word character (for instance, it may be the beginning of the string or a space character).

Which sequence is useful to indicate word boundary in regex?

The metacharacter \b is an anchor like the caret and the dollar sign. It matches at a position that is called a “word boundary”. This match is zero-length.

Can you use regex in R?

Details. A 'regular expression' is a pattern that describes a set of strings. Two types of regular expressions are used in R, extended regular expressions (the default) and Perl-like regular expressions used by perl = TRUE . There is also fixed = TRUE which can be considered to use a literal regular expression.

What does \b represent in regex?

\b. Matches the empty string, but only at the beginning or end of a word. A word is defined as a sequence of word characters. Note that formally, \b is defined as the boundary between a \w and a \W character (or vice versa), or between \w and the beginning/end of the string.


1 Answers

In R, you need two times \:

grep("\\bword\\b", all[5])

Alternative solutions:

grep("^word$", all[5])

which(all[5] == "word")
like image 182
Sven Hohenstein Avatar answered Oct 26 '22 22:10

Sven Hohenstein