I need some help to match few strings stored in vector with address stored in a column of a data frame (data.table). My database is quite large around 1 million records and hence I prefer using data.table.
Below is dummy sample of the data and vector -
my <- data.frame(add=c("50, nutan nagar Mum41","50, nutan Mum88 Maha","77, amar nagar Blr79 Bang","54, veer build Chennai3242","amar 755 Blr 400018"))
vec1 <- c("Mum","Blr","Chennai")
I need to search for each of the strings in vec1 with each address in my variable add. If the variable finds any of the string from vec1 in the address it should return the matched string in a new column result. Incase of multiple match, it should return the 1st matched value, i.e. Incase it finds "Mum" and "Blr" both in a single address it should return "Mum".
Based on the dummy data, expected result would be -
my$result <- c("Mum","Mum","Blr","Chennai","Blr")
I tried using grep / grepl but they give the error "argument 'pattern' has length > 1 and only the first element will be used"
I tried using str_match
but get TRUE / FALSE for each string in vector that is found in address but not the value itself.
How can we achieve this?
We can use str_extract
library(stringr)
str_extract(my$add, paste(vec1, collapse="|"))
#[1] "Mum" "Mum" "Blr" "Chennai" "Blr"
Or with base R
regmatches(my$add, regexpr(paste(vec1, collapse="|"), my$add))
#[1] "Mum" "Mum" "Blr" "Chennai" "Blr"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With