Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Check for a list of strings (words) in a text (phrase)

Tags:

r

Is there an elegant way, other than looping, to test if a word which belong to a list is found in a phrase? I'm thinking something like list comprehension of one of the apply functions. Ex:

words <- c("word1", "word2", "word3")
text <- "This is a text made off of word1 and possibly word2 and so on."

The output should return TRUE if any of the words is founded in text and which word is founded.

like image 365
flamenco Avatar asked Dec 26 '22 08:12

flamenco


2 Answers

grepl to the rescue.

sapply(words, grepl, text)

# word1 word2 word3 
#  TRUE  TRUE FALSE

This considers each element of words, in turn, and returns a logical (TRUE if the the word appears in text, and FALSE if not).

If you want to ensure that the exact words are sought, then you can use:

sapply(words, function(x) grepl(sprintf('\\b%s\\b', x), text))

This will prevent word1 from returning TRUE when text has sword123 but lacks word1. It might make less sense though if words has multi-word elements.

like image 103
jbaums Avatar answered Dec 27 '22 21:12

jbaums


Look at the package stringr. I think the function you need to use is str_detect or str_locate_all. It's is to include this function in sapply.

library(stringr)

str_detect(text, words)

str_locate_all(text, words)

like image 42
Philippe Avatar answered Dec 27 '22 21:12

Philippe