Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using regex in R to find strings as whole words (but not strings as part of words)

Tags:

regex

r

I'm searching for the right regular expression. The following

t1 = c("IGF2, IGF2AS, INS, TH", "TH", "THZH", "ZGTH") grep("TH",t1, value=T) 

returns all elements of t1, but only the first and second are correct. I just want entries with word/phrase TH returned?

like image 746
Hans Avatar asked Aug 29 '11 08:08

Hans


People also ask

How do I extract words from a string in R?

To extract words from a string vector, we can use word function of stringr package. For example, if we have a vector called x that contains 100 words then first 20 words can be extracted by using the command word(x,start=1,end=20,sep=fixed(" ")).

How do you match a whole expression in regex?

To run a “whole words only” search using a regular expression, simply place the word between two word boundaries, as we did with ‹ \bcat\b ›. The first ‹ \b › requires the ‹ c › to occur at the very start of the string, or after a nonword character.

What does \b mean in regex?

The metacharacter \b is an anchor like the caret and the dollar sign. It matches at a position that is called a “word boundary”. This match is zero-length. There are three different positions that qualify as word boundaries: Before the first character in the string, if the first character is a word character.


2 Answers

You need to add word boundary anchors (\b) around your search strings so only entire words will be matched (i. e. words surrounded by non-word characters or start/end of string, where "word character" means \w, i.e. alphanumeric character).

Try

grep("\\bTH\\b",t3, value=T) 
like image 171
Tim Pietzcker Avatar answered Oct 12 '22 01:10

Tim Pietzcker


You can use \< and \> in a regexp to match at the beginning/end of the word.

grep ("\\<TH\\>", t1) etc.

like image 34
Anatoliy Avatar answered Oct 12 '22 01:10

Anatoliy