Matching fuzzy strings

Tags:

I have two tables that I need to merge together in PostgreSQL, on the common variable "company name." Unfortunately many of the company names don't match exactly (i.e. MICROSOFT in one table, MICROSFT in the other). I've tried removing common words from both columns such as "corporation" or "inc" or "ltd" in order to try to standardize names across both tables, but I'm having trouble thinking of additional strategies. Any ideas?

Thanks.

Also, if necessary I can do this in R.

225

asked Jan 19 '12 16:01

aesir

1 Answers

Have you considered the fuzzystrmatch module? You can use soundex, difference, levenshtein, metaphone and dmetaphone, or a combination.

fuzzystrmatch documentation

SELECT something
FROM somewhere
WHERE levenshtein(item1, item2) < Carefully_Selected_Threshold

For example the levenshtein distance from MICROSOFT to MICROSFT is one (1).

levenshtein(dmetaphone('MICROSOFT'), dmetaphone('MICROSFT')

The above returns zero (0). Combining levenshtein and dmetaphone could help you match lots of misspellings.

answered Nov 03 '22 01:11

Anders Marzi Tornblad

Related questions
                            
                                Groupings of queries
                            
                                What is the best practices in db design when I want to store a value that is either selected from a dropdown list or user-entered?
                            
                                What is your preferred document format for documenting databases [closed]
                            
                                Creating Models in ASP.NET MVC
                            
                                Version control Access 2007 database and application
                            
                                Import large file on MySQL DB
                            
                                Is there a way to automate the generation of PowerPoint slides? [closed]
                            
                                How to update table schema after an app upgrade on Android?
                            
                                How to bring coordination between file system and database?
                            
                                Do I need a spatial index in my database?
                            
                                What is the most "database independent" way of creating a variable length text field in a database
                            
                                Good embedded database for Qt?
                            
                                Data Warehousing arbitrary fields
                            
                                Clustered indexes on non-identity columns to speed up bulk inserts?
                            
                                ASP.NET/SQL 2008 Performance issue
                            
                                iOS App-to-App Trasnmission of Data using new Document Support API
                            
                                Dealing with id's in entity object design
                            
                                Managing DB migration: scripts vs tools
                            
                                Is a graph database better for shortest paths algorithms?
                            
                                MySQL using numeric status codes vs text

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Matching fuzzy strings

Tags:

string

database

matching

postgresql

fuzzy

aesir

People also ask

1 Answers

Anders Marzi Tornblad

Recent Activity

Donate For Us