Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Match two lists, one with partial strings and another with full string, return the whole string if match

Match two lists in R, one with partial strings and another with full string, return the whole string if matches. Return only the unique matches (once).

So, let's say I have a CSV file and each row has a long string (the long list). Then, I shorten the string using substr, and then I drop any duplicate strings using unique. Then I want to compare the long string list df12 with the unique short list df14, and if there are unique matches on partial string search (df14 vs df12), then return the whole string from df12.

This is df12 (long string list)

    [1] I like stackoverflow very much today
    [2] I like stackoverflow much today
    [3] I dont like stackoverflow very much today
    [4] I dont like you!
    [5] What? 

df13<-substr(df12, start=0, stop=30)

This is df13 (shortened strings - not unique)

[1] I like stacko
[2] I like stacko
[3] I dont like s
[4] I dont like y
[5] What? 
df14<-unique(df13)

This is df14 (shortened strings - unique strings after applying unique method)

    [1] I like stacko
    [2] I dont like s
    [3] I dont like y
    [4] What? 

This is the result that I want in the end

    [1] I like stackoverflow very much today
    [2] I dont like stackoverflow very much today
    [3] I dont like you!
    [4] What?
like image 480
Elias EstatisticsEU Avatar asked Oct 18 '22 17:10

Elias EstatisticsEU


1 Answers

This is one approach to match every short string in df14 with all possible matches in df12 and output them, including the short string as an index into the list to know which one matched the ones in df12:

df1 <- c('I like stackoverflow very much today', 'I like stackoverflow much today',
         'I dont like stackoverflow very much today', 'I dont like you!',
         'What?')
df2 <- c('I like stacko',  'I dont like s', 'I dont like y', 'What?')

sapply(df2, function(x) df1[grepl(x, df1)])
$`I like stacko`
[1] "I like stackoverflow very much today" "I like stackoverflow much today"     

$`I dont like s`
[1] "I dont like stackoverflow very much today"

$`I dont like y`
[1] "I dont like you!"

$`What?`
[1] "What?"
like image 143
Gopala Avatar answered Oct 21 '22 17:10

Gopala