Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extract best match from string-distance matrix

Tags:

r

I am having trouble extract best match from string-distance matrix.

I am using the package stringdist to compute string-distance matrix.

For example, i am generating my matrix using these lines of code.

library(stringdist)
lookup <- c('Dog', 'Cat', 'Bear')
data <- c('Do g', 'Do gg', 'Caat')
d.matrix <- stringdistmatrix(a = lookup, b = data, useNames="strings",method="cosine")

The matrix looks something like this

enter image description here

My approach is to extract the cosine similarity with lowest number being the best match.

For example, "Do g" would match with "Dog"

What i want to generate is a matching pair data-frame with consine similarity value

data  |  matchwith  |  cosine.s

Do g       Dog         0.1338746
Do gg      Dog         0.1271284
Caat       Cat         0.05719096

I have no clue how to get the data to the table format that i want (above).

Any help would be much appreciated.

like image 258
Phurich.P Avatar asked Jan 18 '26 15:01

Phurich.P


1 Answers

The which.min function is a good solution for this problem.
This a solution using base R:

library(stringdist)
lookup <- c('Dog', 'Cat', 'Bear')
data <- c('Do g', 'Do gg', 'Caat')
d.matrix <- stringdistmatrix(a = lookup, b = data, useNames="strings",method="cosine")

#list of minimun cosine.s
  cosines<-apply(d.matrix, 2, min)

#return list of the row number of the minimum value
  minlist<-apply(d.matrix, 2, which.min) 
#return list of matching values
  matchwith<-lookup[minlist]

#final answer
answer<-data.frame(data, matchwith, cosines)
like image 146
Dave2e Avatar answered Jan 21 '26 06:01

Dave2e