Smart string comparison

Tags:

I am looking for a library/class that allows smart compare of two strings. At best it would give as a result percent of how two strings are alike. I am comparing company names, addresses that are recordered in different repositories, thus having many misspellings or inconsistencies in names.

Sample strings to compare:

 "Good Company Ltd." vs. "GoodCompany" 
 "Baker Street 2" vs. "Baker Str. 2"

If I get a result in percentage of alikeness, than this can be an input for smart merge of such data.

Do you know any good libraries that would allow such smart string compare?

909

asked May 23 '13 11:05

Radoslaw

2 Answers

Levenshtein is not appropriate in this case. "Good Company Ltd" and "GoodCompany" if trimmed have a distance = 3 while "Good Company Ltd" and "Food Company Ltd" have a distance of 1, but totally a different meaning. I suggest Metaphone or Double Metaphone algorithm.

Using online metaphone comparer the results are:

Good Company Ltd = KTKMPNLTT
GoodCompany = KTKMPN
Food Company Ltd = FTKMPNLTT
GoodCompanyLLC = KTKMPNLK

In this way you know that GoodCompany, Good Company Ltd and GoodCompanyLLC are similar, while Food Company is misspelled or totally not related (KTKMPN is contained both in KTKMPNLTT and KTKMPNLK but not in FTKMPNLTT).

Look here for other algorithms comparisons.

109

answered Oct 08 '22 02:10

Francesco De Lisi

You might want to look for Levenshtein Distance implementation. It shows how many characters insertions/deletions and substitutions are required to make two strings equal.

Here is a post about libraries in C# that implement Levenshtein Distance and other text-comparison algorithms: .NET library for text algorithms?.

However I think you'll have to use some combination of methods, because using Levenshtein will tell you that 'Good Company Ltd.' is more similar to 'Bad Company Ltd.' than to 'GoodCompany'.

Maybe you'll have to do some preprocessing by expanding 'str.' to 'street' and removing 'Ltd.' as 'meaningless' word in terms of string comparison.

UPDATE 1

Francesco De Lisi suggests to use Phonetic algoritms. Looks like they are better suited for comparing misspelled names. Still you'll need to split addresses to phonetic / non-phonetic parts (like building numbers) and compare them separately.

UPDATE 2

As for addresses comparison, this post suggests to use Google Maps API for this purpose and another post discusses address parsing. I guess that Google can produce reliable results because they have a database of street addresses where they can find the most correct street name spelling. Without list of correct street/company names you may encounter some strange name that is incorrect, however many different correct names would be similar to it.

answered Oct 08 '22 01:10

Artemix

Related questions
                            
                                c# radial gradient brush effect in GDI and winforms
                            
                                C#: Call non-generic method from generic method
                            
                                Difference between SetupSet and SetupProperty in Moq
                            
                                How can I convert List<string> to List<myEnumType>?
                            
                                Why can't I find or use UrlEncode in Visual Studio 2010?
                            
                                Using 'switch' with strings in resource file
                            
                                Binding to a dependency property of a user control WPF/XAML
                            
                                Login failed. Login failed for user 'NT AUTHORITY\SYSTEM'
                            
                                C#/ASP.NET: can't remove cookies with Domain property specified
                            
                                Pass parameter to a customized template
                            
                                c# very slow populating the datagridview
                            
                                The requested operation cannot be performed error when compiling an XNA project
                            
                                Get original request url in WCF REST service
                            
                                How to iterate through enum type while skipping some values?
                            
                                C# lock(mylocker) not work
                            
                                How to plug method parameters into custom attribute
                            
                                Hide Scrollbars in the webBrowser control
                            
                                How to resize Image in C# WinRT/winmd?
                            
                                Complex "Contains" string comparison
                            
                                C# nested dictionaries

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Smart string comparison

Tags:

string

c#

string-comparison

Radoslaw

People also ask

2 Answers

Francesco De Lisi

Artemix

Recent Activity

Donate For Us