Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I get a percent accuracy match when comparing two strings of an address?

I am trying to compare two lists of names and addresses to see find unique data. I can easily extract out all those are are exactly the same string in both lists, then I am left with names and addresses that are different but may be the same people. ie:

entry in list 1 Smith J Ph234567 34 Smith st

entry in list 2 Smith John Ph234567 34 Smith st

or

entry in list 1 Smith J Ph234567 34 Smith Rd

entry in list 2 Smith J Ph234567 34 Smith Road

I want to add a tag to entries that seem to be similar with each other like 80% match.

Nested Foreach loops don't work as they match every word, or letter (depending how you write it in the string with every other word or letter.

For loops don't work as one change J vrs John creates errors for every entry after the change.

I am writing it in vb.net but can also translate from C#

like image 837
netchicken Avatar asked Mar 13 '13 23:03

netchicken


1 Answers

This kind of problem is generally solved by calculating the edit distance between the strings. Start with the Levenshtein distance for instance.

This will give you a score (the number of “edit operations” necessary to transform one string into the other). To convert this into a percent identity you need to normalise it by the length of the larger string (something along the lines of percent = (largerString.Length - editDistance) / largerString.Length).

like image 81
Konrad Rudolph Avatar answered Oct 04 '22 20:10

Konrad Rudolph