Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL Server Full-Text Search for exact match with fallback

First off there seems to be no way to get an exact match using a full-text search. This seems to be a highly discussed issue when using the full-text search method and there are lots of different solutions to achieve the desired result, however most seem very inefficient. Being I'm forced to use full-text search due to the volume of my database I recently had to implement one of these solutions to get more accurate results.

I could not use the ranking results from the full-text search because of how it works. For instance if you searched for a movie called Toy Story and there was also a movie called The Story Behind Toy Story that would come up instead of the exact match because it found the word Story twice and Toy.

I do track my own rankings which I call "Popularity" each time a user access a record the number goes up. I use this datapoint to weight my results to help determine what the user might be looking for.

I also have the issue where sometimes need to fall back to a LIKE search and not return an exact match. I.e. searching Goonies should return The Goonies (most popular result)

So here is an example of my current stored procedure for achieving this:

DECLARE @Title varchar(255)
SET @Title = '"Toy Story"'
--need to remove quotes from parameter for LIKE search
DECLARE @Title2 varchar(255)
SET @Title2 = REPLACE(@title, '"', '')

--get top 100 results using full-text search and sort them by popularity
SELECT TOP(100) id, title, popularity As Weight into #TempTable FROM movies WHERE CONTAINS(title, @Title) ORDER BY [Weight] DESC

--check if exact match can be found
IF EXISTS(select * from #TempTable where Title = @title2)
--return exact match
SELECT TOP(1) * from #TempTable where Title = @title2
ELSE
--no exact match found, try using like with wildcards
SELECT TOP(1) * from #TempTable where Title like '%' + @title2 + '%'
DROP TABLE #TEMPTABLE

This stored procedure is executed about 5,000 times a minute, and crazy enough it's not bringing my server to it's knees. But I really want to know if there was a more efficient approach to this? Thanks.

like image 247
bfritz Avatar asked Apr 21 '13 11:04

bfritz


3 Answers

You should use full text search CONTAINSTABLE to find the top 100 (possibly 200) candidate results and then order the results you found using your own criteria.

It sounds like you'd like to ORDER BY

  1. exact match of the phrase (=)
  2. the fully matched phrase (LIKE)
  3. higher value for the Popularity column
  4. the Rank from the CONTAINSTABLE

But you can toy around with the exact order you prefer.

In SQL that looks something like:

DECLARE @title varchar(255)
SET @title = '"Toy Story"'
--need to remove quotes from parameter for LIKE search
DECLARE @title2 varchar(255)
SET @title2 = REPLACE(@title, '"', '')

SELECT
    m.ID,
    m.title,
    m.Popularity,
    k.Rank
FROM Movies m
INNER JOIN CONTAINSTABLE(Movies, title, @title, 100) as [k]
    ON m.ID = k.[Key]
ORDER BY 
  CASE WHEN m.title = @title2 THEN 0 ELSE 1 END,
  CASE WHEN m.title LIKE @title2 THEN 0 ELSE 1 END,
  m.popularity desc,
  k.rank

See SQLFiddle

like image 61
flup Avatar answered Nov 09 '22 10:11

flup


This will give you the movies that contain the exact phrase "Toy Story", ordered by their popularity.

SELECT
    m.[ID],
    m.[Popularity],
    k.[Rank]
FROM [dbo].[Movies] m
INNER JOIN CONTAINSTABLE([dbo].[Movies], [Title], N'"Toy Story"') as [k]
    ON m.[ID] = k.[Key]
ORDER BY m.[Popularity]

Note the above would also give you "The Goonies Return" if you searched "The Goonies".

like image 2
jdl Avatar answered Nov 09 '22 10:11

jdl


If got the feeling you don't really like the fuzzy part of the full text search but you do like the performance part.

Maybe is this a path: if you insist on getting the EXACT match before a weighted match you could try to hash the value. For example 'Toy Story' -> bring to lowercase -> toy story -> Hash into 4de2gs5sa (with whatever hash you like) and perform a search on the hash.

like image 1
Paul Avatar answered Nov 09 '22 08:11

Paul