I'm looking for advice on how to tackle the issue of different spelling for the same name. I have a SQL Server database with company names, and there are some companies that are the same but the spelling is different.
For example:
Building Supplies pty
Buidings Supplies pty
Building Supplied l/d
The problem is that there are no clear consistencies in the variation. Sometimes it's an extra 's', other times its an extra space.
Unfortunately I don't have a lookup list, so I can't use Fuzzy LookUp. I need to create the clean list.
Is there a method that people use to deal with this problem?
p.s I tried searching for this problem but can't seem to find a similar thread
Thanks
You can use SOUNDEX()
DIFFERENCE()
for this purpose.
DECLARE @SampleData TABLE(ID INT, BLD VARCHAR(50), SUP VARCHAR(50))
INSERT INTO @SampleData
SELECT 1, 'Building','Supplies'
UNION
SELECT 2, 'Buidings','Supplies'
UNION
SELECT 3, 'Biulding','Supplied'
UNION
SELECT 4, 'Road','Contractor'
UNION
SELECT 5, 'Raod','Consractor'
UNION
SELECT 6, 'Highway','Supplies'
SELECT *, DIFFERENCE('Building', BLD) AS DIF
FROM @SampleData
WHERE DIFFERENCE('Building', BLD) >= 3
Result
ID BLD SUP DIF
1 Building Supplies 4
2 Buidings Supplies 3
3 Biulding Supplied 4
If this serves your purpose you can write an update query to update selected record accordingly.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With