Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Search string in variable and return the matched string

I need some help to match few strings stored in vector with address stored in a column of a data frame (data.table). My database is quite large around 1 million records and hence I prefer using data.table.

Below is dummy sample of the data and vector -

my <- data.frame(add=c("50, nutan nagar Mum41","50, nutan Mum88 Maha","77, amar nagar Blr79 Bang","54, veer build Chennai3242","amar 755 Blr 400018"))

vec1 <- c("Mum","Blr","Chennai")

I need to search for each of the strings in vec1 with each address in my variable add. If the variable finds any of the string from vec1 in the address it should return the matched string in a new column result. Incase of multiple match, it should return the 1st matched value, i.e. Incase it finds "Mum" and "Blr" both in a single address it should return "Mum".

Based on the dummy data, expected result would be -

my$result <- c("Mum","Mum","Blr","Chennai","Blr")

I tried using grep / grepl but they give the error "argument 'pattern' has length > 1 and only the first element will be used"

I tried using str_match but get TRUE / FALSE for each string in vector that is found in address but not the value itself.

How can we achieve this?

like image 860
user1412 Avatar asked Jan 03 '23 15:01

user1412


1 Answers

We can use str_extract

library(stringr)
str_extract(my$add, paste(vec1, collapse="|"))
#[1] "Mum"     "Mum"     "Blr"     "Chennai" "Blr"   

Or with base R

regmatches(my$add, regexpr(paste(vec1, collapse="|"), my$add))
#[1] "Mum"     "Mum"     "Blr"     "Chennai" "Blr"    
like image 72
akrun Avatar answered Jan 12 '23 07:01

akrun