Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Levenshtein Distance in VBA [closed]

I have excel sheet with data which I want to get Levenshtein Distance between them. I already tried to export as text, read in from script (php), run Levenshtein (calculate Levenshtein Distance), save it to excel again.

But I am looking for a way to programatically calculate a Levenshtein Distance in VBA. How would I go about doing so?

like image 557
Yousf Avatar asked Nov 22 '10 06:11

Yousf


People also ask

How is Levenshtein distance calculated?

The Levenshtein distance is usually calculated by preparing a matrix of size (M+1)x(N+1) —where M and N are the lengths of the 2 words—and looping through said matrix using 2 for loops, performing some calculations within each iteration.

Is Levenshtein distance edit distance?

The Levenshtein distance (a.k.a edit distance) is a measure of similarity between two strings. It is defined as the minimum number of changes required to convert string a into string b (this is done by inserting, deleting or replacing a character in string a ).

Is Levenshtein distance NLP?

The Levenshtein distance used as a metric provides a boost to accuracy of an NLP model by verifying each named entity in the entry. The vector search solution does a good job, and finds the most similar entry as defined by the vectorization.


2 Answers

Translated from Wikipedia :

Option Explicit Public Function Levenshtein(s1 As String, s2 As String)  Dim i As Integer Dim j As Integer Dim l1 As Integer Dim l2 As Integer Dim d() As Integer Dim min1 As Integer Dim min2 As Integer  l1 = Len(s1) l2 = Len(s2) ReDim d(l1, l2) For i = 0 To l1     d(i, 0) = i Next For j = 0 To l2     d(0, j) = j Next For i = 1 To l1     For j = 1 To l2         If Mid(s1, i, 1) = Mid(s2, j, 1) Then             d(i, j) = d(i - 1, j - 1)         Else             min1 = d(i - 1, j) + 1             min2 = d(i, j - 1) + 1             If min2 < min1 Then                 min1 = min2             End If             min2 = d(i - 1, j - 1) + 1             If min2 < min1 Then                 min1 = min2             End If             d(i, j) = min1         End If     Next Next Levenshtein = d(l1, l2) End Function 

?Levenshtein("saturday","sunday")

3

like image 52
smirkingman Avatar answered Sep 20 '22 15:09

smirkingman


Thanks to smirkingman for the nice code post. Here is an optimized version.

1) Use Asc(Mid$(s1, i, 1) instead. Numerical comparision is generally faster than text.

2) Use Mid$ istead of Mid since the later is the variant ver. and adding $ is string ver.

3) Use application function for min. (personal preference only)

4) Use Long instead of Integers since it's what excel natively uses.

Function Levenshtein(ByVal string1 As String, ByVal string2 As String) As Long  Dim i As Long, j As Long Dim string1_length As Long Dim string2_length As Long Dim distance() As Long  string1_length = Len(string1) string2_length = Len(string2) ReDim distance(string1_length, string2_length)  For i = 0 To string1_length     distance(i, 0) = i Next  For j = 0 To string2_length     distance(0, j) = j Next  For i = 1 To string1_length     For j = 1 To string2_length         If Asc(Mid$(string1, i, 1)) = Asc(Mid$(string2, j, 1)) Then             distance(i, j) = distance(i - 1, j - 1)         Else             distance(i, j) = Application.WorksheetFunction.Min _             (distance(i - 1, j) + 1, _              distance(i, j - 1) + 1, _              distance(i - 1, j - 1) + 1)         End If     Next Next  Levenshtein = distance(string1_length, string2_length)  End Function 

UPDATE:

For those who want it: I think it's safe to say that most people use Levenshtein distance to calculate fuzzy match percentages. Here's a way to do that, and I have added an optimization that you can specify the min. match % to return (default is 70%+. You enter percentags like "50" or "80", or "0" to run the formula regardless).

The speed boost comes from the fact that the function will check if it's even possible that it's within the percentage you give it by checking the length of the 2 strings. Please note there are some areas where this function can be optimized, but I have kept it at this for the sake of readability. I concatenated the distance in result for proof of functionality, but you can fiddle with it :)

Function FuzzyMatch(ByVal string1 As String, _                     ByVal string2 As String, _                     Optional min_percentage As Long = 70) As String  Dim i As Long, j As Long Dim string1_length As Long Dim string2_length As Long Dim distance() As Long, result As Long  string1_length = Len(string1) string2_length = Len(string2)  ' Check if not too long If string1_length >= string2_length * (min_percentage / 100) Then     ' Check if not too short     If string1_length <= string2_length * ((200 - min_percentage) / 100) Then          ReDim distance(string1_length, string2_length)         For i = 0 To string1_length: distance(i, 0) = i: Next         For j = 0 To string2_length: distance(0, j) = j: Next          For i = 1 To string1_length             For j = 1 To string2_length                 If Asc(Mid$(string1, i, 1)) = Asc(Mid$(string2, j, 1)) Then                     distance(i, j) = distance(i - 1, j - 1)                 Else                     distance(i, j) = Application.WorksheetFunction.Min _                     (distance(i - 1, j) + 1, _                      distance(i, j - 1) + 1, _                      distance(i - 1, j - 1) + 1)                 End If             Next         Next         result = distance(string1_length, string2_length) 'The distance     End If End If  If result <> 0 Then     FuzzyMatch = (CLng((100 - ((result / string1_length) * 100)))) & _                  "% (" & result & ")" 'Convert to percentage Else     FuzzyMatch = "Not a match" End If  End Function 
like image 44
aevanko Avatar answered Sep 16 '22 15:09

aevanko