Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fuzzy name matching algorithm

I have a database containing names of certain blacklisted companies and individuals. All transactions created, its detail needs to be scanned against these blacklisted names. The created transactions may have names not correctly spelled, for example one can write "Wilson" as "Wilson", "Vilson" or "Veelson". The Fuzzy search logic or utility should match against the name "Wilson" present in the blacklisted database and based on the required correctness / accuracy percentage set by the user, has to show the matching name within the percentage set.

The transactions will be sent in batches or real time to check against black listed names.

I would appreciate, if users who had similar requirement and has implemented them, could also give their views and implementation

like image 762
user2477549 Avatar asked Oct 03 '22 20:10

user2477549


1 Answers

T-SQL leaves a lot to be desired in the realm of fuzzy search. Your best options are third party libraries, but if you don't want to mess with that, your best best is using the DIFFERENCE function built in to SQL Server. For example:

SELECT * FROM tblUsers U WHERE DIFFERENCE(U.Name, @nameEntered) >= 3

A higher return value for DIFFERENCE indicates higher accuracy. A drawback of this is that the algorithm favors words that sound alike, which may not be your desired characteristic.

This next example shows how to get the best match out of a table:

DECLARE @users TABLE (Name VARCHAR(255))

INSERT INTO @users VALUES ('Dylan'), ('Bob'), ('Tester'), ('Dude')

SELECT *, MAX(DIFFERENCE(Name, 'Dillon')) AS SCORE FROM @users GROUP BY Name ORDER BY SCORE DESC

It returns:

 Name | Score
 Dylan  4
 Dude   3
 Bob    2
 Tester 0
like image 158
dkoch74 Avatar answered Oct 07 '22 17:10

dkoch74