Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

grep using a character vector with multiple patterns

Tags:

regex

r

I am trying to use grep to test whether a vector of strings are present in an another vector or not, and to output the values that are present (the matching patterns).

I have a data frame like this:

FirstName Letter    Alex      A1 Alex      A6 Alex      A7 Bob       A1 Chris     A9 Chris     A6 

I have a vector of strings patterns to be found in the "Letter" columns, for example: c("A1", "A9", "A6").

I would like to check whether the any of the strings in the pattern vector is present in the "Letter" column. If they are, I would like the output of unique values.

The problem is, I don't know how to use grep with multiple patterns. I tried:

matches <- unique (     grep("A1| A9 | A6", myfile$Letter, value=TRUE, fixed=TRUE) ) 

But it gives me 0 matches which is not true, any suggestions?

like image 291
user971102 Avatar asked Sep 29 '11 12:09

user971102


People also ask

Can you use Grepl with multiple patterns?

Example 2: Apply grep & grepl with Multiple PatternsWe can also use grep and grepl to check for multiple character patterns in our vector of character strings. We simply need to insert an |-operator between the patterns we want to search for.

What is the difference between grep and Grepl?

Both functions allow you to see whether a certain pattern exists in a character string, but they return different results: grepl() returns TRUE when a pattern exists in a character string. grep() returns a vector of indices of the character strings that contain the pattern.

How do you grep multiple items?

The basic grep syntax when searching multiple patterns in a file includes using the grep command followed by strings and the name of the file or its path. The patterns need to be enclosed using single quotes and separated by the pipe symbol. Use the backslash before pipe | for regular expressions.

What does Grepl () do in R?

The grepl() stands for “grep logical”. In R it is a built-in function that searches for matches of a string or string vector. The grepl() method takes a pattern and data and returns TRUE if a string contains the pattern, otherwise FALSE.


1 Answers

In addition to @Marek's comment about not including fixed==TRUE, you also need to not have the spaces in your regular expression. It should be "A1|A9|A6".

You also mention that there are lots of patterns. Assuming that they are in a vector

toMatch <- c("A1", "A9", "A6") 

Then you can create your regular expression directly using paste and collapse = "|".

matches <- unique (grep(paste(toMatch,collapse="|"),                          myfile$Letter, value=TRUE)) 
like image 124
Brian Diggs Avatar answered Sep 20 '22 04:09

Brian Diggs