Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

%like% with multiple patterns in r

Is it possible to use multiple pattern with %like% in a nested ifelse? If not, what would be the alternative?

fruits<-c("apple", "pineapple", "grape", "avocado","banana")

color <-c("red","yellow","purple", "green","yellow")

mydata = data.frame(fruits=fruits,color=color ) 


mydata %>%
  mutate(group = ifelse(fruits %like% c("%pple%","%vocado%"), "group 1",
                           ifelse(fruits %like% c("%anana%","%grape%"), "group 2", "group 3")))

When I try the code above, I get the following error:

Warning messages:
1: In grep(pattern, levels(vector)) :
  argument 'pattern' has length > 1 and only the first element will be used
2: In grep(pattern, levels(vector)) :
  argument 'pattern' has length > 1 and only the first element will be used

Any guidance is appreciated. Thank you!

like image 707
Danielle Travassos Avatar asked Jan 25 '23 08:01

Danielle Travassos


1 Answers

data.table's like() function and its %like%, %ilike%, and %flike% operator versions only accept a single pattern parameter but you can use alternation in a regular expression. Alternation is expressed by vertical bar:

library(data.table)
library(dplyr)
mydata %>%
  mutate(group = ifelse(fruits %ilike% "apple|avocado", "group 1",
                        ifelse(fruits %ilike% "banana|grape", "group 2", "group 3")))
     fruits  color   group
1     apple    red group 1
2 pineapple yellow group 1
3     grape purple group 2
4   avocado  green group 1
5    banana yellow group 2

So, group 1 matches any string where either apple or avocado appears anywhere in the string. Therefore, % to indicate an arbitray number of arbitrary characters is not required.

Note that %ilike% has been used instead of %like%. %ilike% is a new convenience functions which is for case-insensitive pattern matching and which became available with data.table v1.12.4 (on CRAN since 03 Oct 2019).

%ilike% will also match the word Apple (with a capital A).

Of course, case_when() is a good alternative to nested ifelse() as suggested by r2evans:

mydata %>%
  mutate(group = case_when(fruits %ilike% "apple|avocado" ~ "group 1",
                           fruits %ilike% "banana|grape" ~ "group 2", 
                           TRUE ~ "group 3"))
like image 161
Uwe Avatar answered Feb 08 '23 14:02

Uwe