Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to compare Strings for Percentage Match using vb.net?

Tags:

vb.net

I am banging my head against the wall for a while now trying different techniques.

None of them are working well.

I have two strings.

I need to compare them and get an exact percentage of match,

ie. "four score and seven years ago" TO "for scor and sevn yeres ago"

Well, I first started by comparing every word to every word, tracking every hit, and percentage = count \ numOfWords. Nope, didn't take into account misspelled words.

("four" <> "for" even though it is close)

Then I started by trying to compare every char in each char, incrementing the string char if not a match (to count for misspellings). But, I would get false hits because the first string could have every char in the second but not in the exact order of the second. ("stuff avail" <> "stu vail" (but it would come back as such, low percentage, but a hit. 9 \ 11 = 81%))

SO, I then tried comparing PAIRS of chars in each string. If string1[i] = string2[k] AND string1[i+1] = string2[k+1], increment the count, and increment the "k" when it doesn't match (to track mispellings. "for" and "four" should come back with a 75% hit.) That doesn't seem to work either. It is getting closer, but even with an exact match it is only returns 94%. And then it really gets screwed up when something is really misspelled. (Code at the bottom)

Any ideas or directions to go?

Code

count = 0
j = 0
k = 0
While j < strTempName.Length - 2 And k < strTempFile.Length - 2
    ' To ignore non letters or digits '
    If Not strTempName(j).IsLetter(strTempName(j)) Then
        j += 1
    End If

    ' To ignore non letters or digits '
    If Not strTempFile(k).IsLetter(strTempFile(k)) Then
        k += 1
    End If

    ' compare pair of chars '
    While (strTempName(j) <> strTempFile(k) And _ 
           strTempName(j + 1) <> strTempFile(k + 1) And _ 
           k < strTempFile.Length - 2)
        k += 1
    End While
    count += 1
    j += 1
    k += 1

End While

perc = count / (strTempName.Length - 1)
like image 865
Jad Chahine Avatar asked Sep 13 '25 12:09

Jad Chahine


2 Answers

Edit: I have been doing some research and I think I initially found the code from here and translated it to vbnet years ago. It uses the Levenshtein string matching algorithm.

Here is the code I use for that, hope it helps:

Sub Main()
    Dim string1 As String = "four score and seven years ago"
    Dim string2 As String = "for scor and sevn yeres ago"
    Dim similarity As Single =
        GetSimilarity(string1, string2)
    ' RESULT : 0.8
End Sub

Public Function GetSimilarity(string1 As String, string2 As String) As Single
    Dim dis As Single = ComputeDistance(string1, string2)
    Dim maxLen As Single = string1.Length
    If maxLen < string2.Length Then
        maxLen = string2.Length
    End If
    If maxLen = 0.0F Then
        Return 1.0F
    Else
        Return 1.0F - dis / maxLen
    End If
End Function

Private Function ComputeDistance(s As String, t As String) As Integer
    Dim n As Integer = s.Length
    Dim m As Integer = t.Length
    Dim distance As Integer(,) = New Integer(n, m) {}
    ' matrix
    Dim cost As Integer = 0
    If n = 0 Then
        Return m
    End If
    If m = 0 Then
        Return n
    End If
    'init1

    Dim i As Integer = 0
    While i <= n
        distance(i, 0) = System.Math.Max(System.Threading.Interlocked.Increment(i), i - 1)
    End While
    Dim j As Integer = 0
    While j <= m
        distance(0, j) = System.Math.Max(System.Threading.Interlocked.Increment(j), j - 1)
    End While
    'find min distance

    For i = 1 To n
        For j = 1 To m
            cost = (If(t.Substring(j - 1, 1) = s.Substring(i - 1, 1), 0, 1))
            distance(i, j) = Math.Min(distance(i - 1, j) + 1, Math.Min(distance(i, j - 1) + 1, distance(i - 1, j - 1) + cost))
        Next
    Next
    Return distance(n, m)
End Function
like image 190
Xavier Peña Avatar answered Sep 15 '25 10:09

Xavier Peña


Xavier's code must be correct to:

         While i <= n
            distance(i, 0) = System.Math.Min(System.Threading.Interlocked.Increment(i), i - 1)
        End While
        Dim j As Integer = 0
        While j <= m
            distance(0, j) = System.Math.Min(System.Threading.Interlocked.Increment(j), j - 1)
        End While
like image 39
Predator Avatar answered Sep 15 '25 11:09

Predator



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!