Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

grep in R using a character vector with multiple patterns with same order as vector

Tags:

r

grepl

I have two vectors that I want to grep, but I want to keep the order in the pattern to grep. I solve it using a loop, although I'm wondering if there is any other (better) way of doing it.

EG.

to_match <- c("KZB8","KBB9","KBC9","KZA9","KZB2","KZB5","KZB6")
vectorA <- c("RuL_KZA9","RuL_KZB9","RuL_KZA5","RuL_KZC6","RuL_KZB8")

I solved like this:

matching <- c()
for (i in to_match){
  t <- grep(i, vectorA, value = T)
  matching <- c(matching,t)
}
> matching
[1] "RuL_KZB8" "RuL_KZA9"

BTW, I saw the great answers here: grep using a character vector with multiple patterns

But as you will see see the problem with:

grep(paste(to_match, collapse = "|"),vectorA, value = T)
[1] "RuL_KZA9" "RuL_KZB8"

is that the matching is sorted based on the first element that grep finds and not using the matching vector.

Thanks in advance for your ideas for a more efficient code.

Niko

like image 374
N. Lichilín Avatar asked Dec 18 '18 17:12

N. Lichilín


People also ask

How do I grep multiple patterns in R?

We can also use grep and grepl to check for multiple character patterns in our vector of character strings. We simply need to insert an |-operator between the patterns we want to search for. As you can see, both functions where searching for multiple pattern in the previous R code (i.e. “a” or “c”).

What does Grepl () do in R?

The grepl() stands for “grep logical”. In R it is a built-in function that searches for matches of a string or string vector. The grepl() method takes a pattern and data and returns TRUE if a string contains the pattern, otherwise FALSE.

What is the difference between grep and Grepl?

Both functions allow you to see whether a certain pattern exists in a character string, but they return different results: grepl() returns TRUE when a pattern exists in a character string. grep() returns a vector of indices of the character strings that contain the pattern.

How do you grep multiple items?

The basic grep syntax when searching multiple patterns in a file includes using the grep command followed by strings and the name of the file or its path. The patterns need to be enclosed using single quotes and separated by the pipe symbol. Use the backslash before pipe | for regular expressions.


1 Answers

Try lapply:

unlist(lapply(to_match, grep, vectorA, value = TRUE))
## [1] "RuL_KZB8" "RuL_KZA9"

or

unlist(sapply(to_match, grep, vectorA, value = TRUE))
##       KZB8       KZA9 
## "RuL_KZB8" "RuL_KZA9" 
like image 124
G. Grothendieck Avatar answered Oct 22 '22 10:10

G. Grothendieck