Hope i am not repeating this question. I did some search here and google before posting here.
I am running a eStore with SQL Server 2008R2 with Full Text enabled.
I am using a Hybrid Fulltext and normal like to do search. This gives more relevant results. All the queries executed in to a temp table and distincts were returned.
Matching logic,
Run following SQL to get relevant product using full text. But @Keywords will be pre-processed. Say 'CLC 2200' will be changed to 'CLC* AND 2200*'
SELECT Id FROM dbo.Product WHERE CONTAINS (TextSearch ,@Keywords)
Another query will be running using normal like. So 'CLC 2200' will be pre-processed to 'TextSearch like %clc% AND TextSearch like %2200%'. This is simply because full text search wont search patterns before the keywords. example, it wont return 'pclc 2200'.
SELECT Id FROM dbo.Product WHERE TextSearch like '%clc%' AND TextSearch like '%2200%'
If step 1 and 2 didn't return any records, following search will be executed. Value 135 was fine tuned by me to return more relevant records.
SELECT p.id FROM dbo.Product AS p INNER JOIN FREETEXTTABLE(product,TextSearch,@Keywords) AS r ON p.Id = r.[KEY] WHERE r.RANK > 135
All of above combined works fine in a reasonable speed and returns relevant products for keywords.
But i am looking for to further improve when there is no product found at all.
Say if customer looks for 'CLC 2200npk' and this product wasn't there, i needed to show next very close by 'CLC 2200'.
So far i tried using Soundex() function. Buy computing soundex value for each word in TextSearch column and comparing with soudex value of keyword. But this returns way too many records and slow too.
example, 'CLC 2200npk' will return products such as 'CLC 1100' etc. But this wouldn't be a good result. As it is not close to CLC 2200npk
There is another good one here. but this uses CLR Functions. But i can not install CLR functions on the server.
So my logic would need to be,
if 'CLC 2200npk' not found, show close by 'CLC 2200' if 'CLC 2200' not found, show next close by 'CLC 1100'
Thanks.
A technique of finding the strings that match a pattern approximately (rather than exactly). Users / Reviewers often capture names inaccurately.
Fuzzy matching (FM), also known as fuzzy logic, approximate string matching, fuzzy name matching, or fuzzy string matching is an artificial intelligence and machine learning technology that identifies similar, but not identical elements in data table sets.
Specifies a search condition for a graph. MATCH can be used only with graph node and edge tables, in the SELECT statement as part of WHERE clause.
A rather quick domain specific solution may be to calculate a string similarity using SOUNDEX and a numeric distance between 2 strings. This will only really help when you have a lot of product codes.
Using a simple UDF like below you can extract the numeric chars from a string so that you can then get 2200 out of 'CLC 2200npk' and 1100 out of 'CLC 1100' so you can now determine closeness based on the SOUNDEX output of each input as well as closeness of the numeric component of each input.
CREATE Function [dbo].[ExtractNumeric](@input VARCHAR(1000))
RETURNS INT
AS
BEGIN
WHILE PATINDEX('%[^0-9]%', @input) > 0
BEGIN
SET @input = STUFF(@input, PATINDEX('%[^0-9]%', @input), 1, '')
END
IF @input = '' OR @input IS NULL
SET @input = '0'
RETURN CAST(@input AS INT)
END
GO
As far as general purpose algorithms go there are a couple which might help you with varying degrees of success depending on data set size and performance requirements. (both links have TSQL implementations available)
Here is an interesting article which applies both algos together which may give you a few ideas.
Well hopefully some of that helps a little.
EDIT: Here is a much faster partial Levenshtein Distance implementation (read the post it wont return exact same results as the normal one). On my test table of 125000 rows it runs in 6 seconds compared to 60 seconds for the first one I linked to.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With