Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Finding similar sounding text in VBA [closed]

My manager tells me that there is a way to evaluate names that are spelled differently but sound similar in the way they are pronounced. Ideally, we want to be able to evaluate a user-entered search name and return exact matches as well as "similar sounding" names. He called the process "Soundits" but I cannot find any info on Google.

Does this exist? Does anyone know if it is available for VBA (Access)?

like image 345
Ed Lee Avatar asked Oct 22 '09 14:10

Ed Lee


3 Answers

Nice question! You're question includes a great example of the idea itself.

There is an algorithm called the Russell Soundex algorithm, a standard technique in many applications, that evaluates names by the phonetic rather than the actual spelling. In this question, Soundits and Soundex are similar sounding names! [EDIT: Just ran the Soundex. Soundits=S532 and Soundex=S532.]

About Soundex:

The Soundex algorithm is predicated on characteristics of English such as:

  1. The first letter has high significance
  2. Many consonants sound similar
  3. Consonants affect pronunciation more than vowels

One warning: Soundex was designed for names. The shorter the better. As a name grows longer, the Soundex becomes less reliable.

Resources:

  1. Here is an example that uses VBA for Access.
  2. There is a write-up on Soundex in the VBA Developer's Handbook, 2nd Edition by Ken Getz and Mike Gilbert.
  3. There is a lot of information about Soundex and other variants such as Soundex2 (Search for 'Soundex' and 'VBA').

Code Example:

Below is some VBA code, found via a quick web search, that implements a variation of the Soundex algorithm.

Option Compare Database
Option Explicit

Public Function Soundex(varText As Variant) As Variant
On Error GoTo Err_Handler
    Dim strSource As String
    Dim strOut As String
    Dim strValue As String
    Dim strPriorValue As String
    Dim lngPos As Long

    If Not IsError(varText) Then
        strSource = Trim$(Nz(varText, vbNullString))
        If strSource <> vbNullString Then
            strOut = Left$(strSource, 1&)
            strPriorValue = SoundexValue(strOut)
            lngPos = 2&

            Do
                strValue = SoundexValue(Mid$(strSource, lngPos, 1&))
                If ((strValue <> strPriorValue) And (strValue <> vbNullString)) Or (strValue = "0") Then
                    strOut = strOut & strValue
                    strPriorValue = strValue
                End If
                lngPos = lngPos + 1&
            Loop Until Len(strOut) >= 4&
        End If
    End If

    If strOut <> vbNullString Then
        Soundex = strOut
    Else
        Soundex = Null
    End If

Exit_Handler:
    Exit Function

Err_Handler:
    MsgBox "Error " & Err.Number & ": " & Err.Description, vbExclamation, "Soundex()"
    Resume Exit_Handler
End Function
Private Function SoundexValue(strChar As String) As String
    Select Case strChar
    Case "B", "F", "P", "V"
        SoundexValue = "1"
    Case "C", "G", "J", "K", "Q", "S", "X", "Z"
        SoundexValue = "2"
    Case "D", "T"
        SoundexValue = "3"
    Case "L"
        SoundexValue = "4"
    Case "M", "N"
        SoundexValue = "5"
    Case "R"
        SoundexValue = "6"
    Case vbNullString
        SoundexValue = "0"
    Case Else
        'Return nothing for "A", "E", "H", "I", "O", "U", "W", "Y", non-alpha.
    End Select
End Function

Levenshtein distance

Another method of comparing strings is to get the Levenshtein distance. Here is the example given in VBA, it is taken from LessThanDot Wiki:

Function LevenshteinDistance(word1, word2)

Dim s As Variant
Dim t As Variant
Dim d As Variant
Dim m, n
Dim i, j, k
Dim a(2), r
Dim cost

   m = Len(word1)
   n = Len(word2)

   ''This is the only way to use
   ''variables to dimension an array
   ReDim s(m)
   ReDim t(n)
   ReDim d(m, n)

   For i = 1 To m
       s(i) = Mid(word1, i, 1)
   Next

   For i = 1 To n
       t(i) = Mid(word2, i, 1)
   Next

   For i = 0 To m
       d(i, 0) = i
   Next

   For j = 0 To n
       d(0, j) = j
   Next


   For i = 1 To m
       For j = 1 To n

           If s(i) = t(j) Then
               cost = 0
           Else
               cost = 1
           End If

           a(0) = d(i - 1, j) + 1             '' deletion
           a(1) = d(i, j - 1) + 1             '' insertion
           a(2) = d(i - 1, j - 1) + cost      '' substitution

           r = a(0)

           For k = 1 To UBound(a)
               If a(k) < r Then r = a(k)
           Next

           d(i, j) = r

       Next

   Next

   LevenshteinDistance = d(m, n)

End Function
like image 65
9 revs, 2 users 73% Avatar answered Oct 31 '22 14:10

9 revs, 2 users 73%


Here are a couple working examples of the SOUNDEX algorithm in VBA:

  • Demo in MS Access
  • Demo in MS Excel
like image 4
ewall Avatar answered Oct 31 '22 14:10

ewall


In addition to Soundex, which is often gives you too loose a match to be really useful, you should also look at Soundex2 (a variant of Soundex that is more granular), and for a different kind of matching, Simil(). I use all three.

like image 3
David-W-Fenton Avatar answered Oct 31 '22 14:10

David-W-Fenton