Match two lists in R, one with partial strings and another with full string, return the whole string if matches. Return only the unique matches (once).
So, let's say I have a CSV file and each row has a long string (the long list). Then, I shorten the string using substr, and then I drop any duplicate strings using unique. Then I want to compare the long string list df12
with the unique short list df14
, and if there are unique matches on partial string search (df14
vs df12
), then return the whole string from df12
.
This is df12
(long string list)
[1] I like stackoverflow very much today
[2] I like stackoverflow much today
[3] I dont like stackoverflow very much today
[4] I dont like you!
[5] What?
df13<-substr(df12, start=0, stop=30)
This is df13
(shortened strings - not unique)
[1] I like stacko
[2] I like stacko
[3] I dont like s
[4] I dont like y
[5] What?
df14<-unique(df13)
This is df14
(shortened strings - unique strings after applying unique method)
[1] I like stacko
[2] I dont like s
[3] I dont like y
[4] What?
This is the result that I want in the end
[1] I like stackoverflow very much today
[2] I dont like stackoverflow very much today
[3] I dont like you!
[4] What?
This is one approach to match every short string in df14 with all possible matches in df12 and output them, including the short string as an index into the list to know which one matched the ones in df12:
df1 <- c('I like stackoverflow very much today', 'I like stackoverflow much today',
'I dont like stackoverflow very much today', 'I dont like you!',
'What?')
df2 <- c('I like stacko', 'I dont like s', 'I dont like y', 'What?')
sapply(df2, function(x) df1[grepl(x, df1)])
$`I like stacko`
[1] "I like stackoverflow very much today" "I like stackoverflow much today"
$`I dont like s`
[1] "I dont like stackoverflow very much today"
$`I dont like y`
[1] "I dont like you!"
$`What?`
[1] "What?"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With