Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Complete word matching using grepl in R

Tags:

Consider the following example:

> testLines <- c("I don't want to match this","This is what I want to match") > grepl('is',testLines) > [1] TRUE TRUE 

What I want, though, is to only match 'is' when it stands alone as a single word. From reading a bit of perl documentation, it seemed that the way to do this is with \b, an anchor that can be used to identify what comes before and after the patter, i.e. \bword\b matches 'word' but not 'sword'. So I tried the following example, with use of Perl syntax set to 'TRUE':

> grepl('\bis\b',testLines,perl=TRUE) > [1] FALSE FALSE 

The output I'm looking for is FALSE TRUE.

like image 972
aaron Avatar asked Jun 29 '11 23:06

aaron


People also ask

What does Grepl do in R?

The grepl() stands for “grep logical”. In R it is a built-in function that searches for matches of a string or string vector. The grepl() method takes a pattern and data and returns TRUE if a string contains the pattern, otherwise FALSE.

How do you match a word in regex?

To run a “whole words only” search using a regular expression, simply place the word between two word boundaries, as we did with ‹ \bcat\b ›. The first ‹ \b › requires the ‹ c › to occur at the very start of the string, or after a nonword character.

What does regex 0 * 1 * 0 * 1 * Mean?

Basically (0+1)* mathes any sequence of ones and zeroes. So, in your example (0+1)*1(0+1)* should match any sequence that has 1. It would not match 000 , but it would match 010 , 1 , 111 etc. (0+1) means 0 OR 1.

How do I match a pattern in regex?

Most characters, including all letters ( a-z and A-Z ) and digits ( 0-9 ), match itself. For example, the regex x matches substring "x" ; z matches "z" ; and 9 matches "9" . Non-alphanumeric characters without special meaning in regex also matches itself. For example, = matches "=" ; @ matches "@" .


2 Answers

"\<" is another escape sequence for the beginning of a word, and "\>" is the end. In R strings you need to double the backslashes, so:

> grepl("\\<is\\>", c("this", "who is it?", "is it?", "it is!", "iso")) [1] FALSE  TRUE  TRUE  TRUE FALSE 

Note that this matches "is!" but not "iso".

like image 155
Tommy Avatar answered Oct 26 '22 02:10

Tommy


you need double-escaping to pass escape to regex:

> grepl("\\bis\\b",testLines) [1] FALSE  TRUE 
like image 45
kohske Avatar answered Oct 26 '22 00:10

kohske