Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

grepl in R to find matches to any of a list of character strings

Tags:

Is it possible to use a grepl argument when referring to a list of values, maybe using the %in% operator? I want to take the data below and if the animal name has "dog" or "cat" in it, I want to return a certain value, say, "keep"; if it doesn't have "dog" or "cat", I want to return "discard".

data <- data.frame(animal = sample(c("cat","dog","bird", 'doggy','kittycat'), 50, replace = T)) 

Now, if I were just to do this by strictly matching values, say, "cat" and "dog', I could use the following approach:

matches <- c("cat","dog")  data$keep <- ifelse(data$animal %in% matches, "Keep", "Discard") 

But using grep or grepl only refers to the first argument in the list:

data$keep <- ifelse(grepl(matches, data$animal), "Keep","Discard") 

returns

Warning message: In grepl(matches, data$animal) :   argument 'pattern' has length > 1 and only the first element will be used 

Note, I saw this thread in my search, but this doesn't appear to work: grep using a character vector with multiple patterns

like image 783
Marc Tulla Avatar asked Aug 19 '14 19:08

Marc Tulla


People also ask

What does Grepl () do in R?

The grepl() stands for “grep logical”. In R it is a built-in function that searches for matches of a string or string vector. The grepl() method takes a pattern and data and returns TRUE if a string contains the pattern, otherwise FALSE.

What is the difference between grep and Grepl in R?

The grep and grepl functions use regular expressions or literal values as patterns to conduct pattern matching on a character vector. The grep returns indices of matched items or matched items themselves while grepl returns a logical vector with TRUE to represent a match and FALSE otherwise.

Can you use Grepl with multiple patterns?

Example 2: Apply grep & grepl with Multiple PatternsWe can also use grep and grepl to check for multiple character patterns in our vector of character strings. We simply need to insert an |-operator between the patterns we want to search for.


1 Answers

You can use an "or" (|) statement inside the regular expression of grepl.

ifelse(grepl("dog|cat", data$animal), "keep", "discard") # [1] "keep"    "keep"    "discard" "keep"    "keep"    "keep"    "keep"    "discard" # [9] "keep"    "keep"    "keep"    "keep"    "keep"    "keep"    "discard" "keep"    #[17] "discard" "keep"    "keep"    "discard" "keep"    "keep"    "discard" "keep"    #[25] "keep"    "keep"    "keep"    "keep"    "keep"    "keep"    "keep"    "keep"    #[33] "keep"    "discard" "keep"    "discard" "keep"    "discard" "keep"    "keep"    #[41] "keep"    "keep"    "keep"    "keep"    "keep"    "keep"    "keep"    "keep"    #[49] "keep"    "discard" 

The regular expression dog|cat tells the regular expression engine to look for either "dog" or "cat", and return the matches for both.

like image 149
Rich Scriven Avatar answered Sep 24 '22 17:09

Rich Scriven