Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Partial string matching with grep and regular expressions

Tags:

string

regex

r

I have a vector of three character strings, and I'm trying to write a command that will find which members of the vector have a particular letter as the second character.

As an example, say I have this vector of 3-letter stings...

example = c("AWA","WOO","AZW","WWP")

I can use grepl and glob2rx to find strings with W as the first or last character.

> grepl(glob2rx("W*"),example)
[1] FALSE  TRUE FALSE  TRUE

> grepl(glob2rx("*W"),example)
[1] FALSE FALSE  TRUE FALSE

However, I don't get the right result when I trying using it with glob2rx(*W*)

> grepl(glob2rx("*W*"),example)
[1] TRUE TRUE TRUE TRUE

I am sure my understanding of regular expressions is lacking, however this seems like a pretty straightforward problem and I can't seem to find the solution. I'd really love some assistance!

For future reference, I'd also really like to know if I could extend this to the case where I have longer strings. Say I have strings that are 5 characters long, could I use grepl in such a way to return strings where W is the third character?

like image 938
Starcalibre Avatar asked Jan 03 '14 02:01

Starcalibre


People also ask

Can you use regex with grep?

The grep command (short for Global Regular Expressions Print) is a powerful text processing tool for searching through files and directories. When grep is combined with regex (regular expressions), advanced searching and output filtering become simple.

How are Regexpr Gregexpr and Regexec different than grep Grepl?

Description. grep , grepl , regexpr , gregexpr , regexec and gregexec search for matches to argument pattern within each element of a character vector: they differ in the format of and amount of detail in the results. sub and gsub perform replacement of the first and all matches respectively.

What is the use of grep () Grepl () substr ()?

17.4 grepl() grepl() returns a logical vector indicating which element of a character vector contains the match. For example, suppose we want to know which states in the United States begin with word “New”. Here, we can see that grepl() returns a logical vector that can be used to subset the original state.name vector.

How do you escape special characters in grep?

If you include special characters in patterns typed on the command line, escape them by enclosing them in single quotation marks to prevent inadvertent misinterpretation by the shell or command interpreter. To match a character that is special to grep –E, put a backslash ( \ ) in front of the character.


1 Answers

I would have thought that this was the regex way:

>  grepl("^.W",example)
[1]  TRUE FALSE FALSE  TRUE

If you wanted a particular position that is prespecified then:

>  grepl("^.{1}W",example)
[1]  TRUE FALSE FALSE  TRUE

This would allow programmatic calculation:

pos= 2
n=pos-1
grepl(paste0("^.{",n,"}W"),example)
[1]  TRUE FALSE FALSE  TRUE
like image 68
IRTFM Avatar answered Sep 29 '22 12:09

IRTFM