Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Detect multiple strings with dplyr and stringr

Tags:

r

dplyr

stringr

I'm trying to combine dplyr and stringr to detect multiple patterns in a dataframe. I want to use dplyr as I want to test a number of different columns.

Here's some sample data:

test.data <- data.frame(item = c("Apple", "Bear", "Orange", "Pear", "Two Apples"))
fruit <- c("Apple", "Orange", "Pear")
test.data
        item
1      Apple
2       Bear
3     Orange
4       Pear
5 Two Apples

What I would like to use is something like:

test.data <- test.data %>% mutate(is.fruit = str_detect(item, fruit))

and receive

        item is.fruit
1      Apple        1
2       Bear        0
3     Orange        1
4       Pear        1
5 Two Apples        1

A very simple test works

> str_detect("Apple", fruit)
[1]  TRUE FALSE FALSE
> str_detect("Bear", fruit)
[1] FALSE FALSE FALSE

But I can't get this to work over the column of the dataframe, even without dplyr:

> test.data$is.fruit <- str_detect(test.data$item, fruit)
Error in check_pattern(pattern, string) : 
  Lengths of string and pattern not compatible

Does anyone know how to do this?

like image 576
r.bot Avatar asked Oct 30 '14 17:10

r.bot


People also ask

How to detect multiple strings in r?

Detect one of the multiple strings in RYou can work with the multiple grepl functions. If you have a lot of patterns that you want to check, then it is better to use grepl with sapply and apply functions. Another approach in the detection of any of the strings is the usage of the OR operator in regex.

How do I find patterns in R?

You can use the str_detect() function from the stringr function R to detect the presence or absence of a certain pattern in a string. This function returns TRUE if the pattern is present in the string or FALSE if it is not.


2 Answers

str_detect only accepts a length-1 pattern. Either turn it into one regex using paste(..., collapse = '|') or use any:

sapply(test.data$item, function(x) any(sapply(fruit, str_detect, string = x)))
# Apple       Bear     Orange       Pear Two Apples
#  TRUE      FALSE       TRUE       TRUE       TRUE

str_detect(test.data$item, paste(fruit, collapse = '|'))
# [1]  TRUE FALSE  TRUE  TRUE  TRUE
like image 172
Robert Krzyzanowski Avatar answered Oct 25 '22 02:10

Robert Krzyzanowski


This simple approach works fine for EXACT matches:

test.data %>% mutate(is.fruit = item %in% fruit)
# A tibble: 5 x 2
        item is.fruit
       <chr>    <lgl>
1      Apple     TRUE
2       Bear    FALSE
3     Orange     TRUE
4       Pear     TRUE
5 Two Apples    FALSE

This approach works for partial matching (which is the question asked):

test.data %>% 
rowwise() %>% 
mutate(is.fruit = sum(str_detect(item, fruit)))

Source: local data frame [5 x 2]
Groups: <by row>

# A tibble: 5 x 2
        item is.fruit
       <chr>    <int>
1      Apple        1
2       Bear        0
3     Orange        1
4       Pear        1
5 Two Apples        1
like image 45
Henrik Avatar answered Oct 25 '22 02:10

Henrik