Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extract pattern from string in R without distinguishing between upper and lower case letters

Tags:

string

r

extract

This is a toy example. I want to search within a and extract those colors that are listed in b. Even if the color does not start with an upper case letter, I want to extract it. However, the output should tell me how the color was used in a.

So the answer I would like to get is #"Red" NA "blue.

a <- "She has Red hair and blue eyes"
b <- c("Red", "Yellow", "Blue")
str_extract(a, b)#"Red" NA    NA

I used str_extract from 'stringr', but would be happy to use another function/package (e.g., grep).

like image 302
milan Avatar asked Jun 14 '16 03:06

milan


1 Answers

We can do this base R

unlist(sapply(tolower(b), function(x) {
        x1 <- regmatches(a, gregexpr(x, tolower(a)))
      replace(x1, x1 == "character(0)", NA)}), use.names=FALSE)
# "Red"     NA "blue" 

Or as inspired from @leerssej's answer

library(stringr)
str_extract(a, fixed(b, ignore_case=TRUE))
#[1] "Red"  NA     "blue"
like image 95
akrun Avatar answered Oct 27 '22 21:10

akrun