Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL Server Weighted Full Text Search

Currently I have a table that I search upon 4 fields, FirstName, LastName, MiddleName, And AKA's. I currently have a CONTAINSTABLE search for the rows and it works. Not well but it works. Now I want to make the First Name weighted higher and middle name lower.

I found the command ISABOUT but that seems pretty worthless if I have to do it by word not column (hopefully I understood this wrong). This is not an option if its by word because I do not know how many words the user will enter.

I found the thread here that talks about this same solution however I was unable to get the accepted solution to work. Maybe I have done something wrong but regardless I cannot get it to work, and its logic seems really... odd. There has to be an easier way.

like image 405
corymathews Avatar asked Nov 21 '08 22:11

corymathews


People also ask

How do I verify a full text search in SQL Server?

How can I tell if Full-Text Search is enabled on my SQL Server instance? A: You can determine if Full-Text Search is installed by querying the FULLTEXTSERVICEPROPERTY like you can see in the following query. If the query returns 1 then Full-Text Search is enabled.

Which is used for full text search in SQL?

Full-Text Engine. The Full-Text Engine in SQL Server is fully integrated with the query processor. The Full-Text Engine compiles and executes full-text queries. As part of query execution, the Full-Text Engine might receive input from the thesaurus and stoplist.

How do I do a full text search?

To implement a full-text search in a SQL database, you must create a full-text index on each column you want to be indexed. In MySQL, this would be done with the FULLTEXT keyword. Then you will be able to query the database using MATCH and AGAINST.


2 Answers

The key to manipulating the rankings is to use a union. For each column you use a separate select statement. In that statement, add an identifier that shows from which column each row was pulled then. Insert the results into a table variable, then you can manipulate the ranking by sorting on the identifier or multiplying the rank by some value based on the identifier.

The key is to give the appearance of modifying the ranking, not to actually change sql server's ranking.

Example using a table variable:

DECLARE @Results TABLE (PersonId Int, Rank Int, Source Int)

For table People with Columns PersonId Int PK Identity, FirstName VarChar(100), MiddleName VarChar(100), LastName VarChar(100), AlsoKnown VarChar(100) with each column added to a full text catalog, you could use the query:

INSERT INTO @Results (PersonId, Rank, Source)

SELECT PersonId, Rank, 1
FROM ContainsTable(People, FirstName, @SearchValue) CT INNER JOIN People P ON CT.Key = P.PersonId

UNION
SELECT PersonId, Rank, 2
FROM ContainsTable(People, MiddleName, @SearchValue) CT INNER JOIN People P ON CT.Key = P.PersonId

UNION
SELECT PersonId, Rank, 3
FROM ContainsTable(People, LastName, @SearchValue) CT INNER JOIN People P ON CT.Key = P.PersonId

UNION
SELECT PersonId, Rank, 4
FROM ContainsTable(People, AlsoKnown, @SearchValue) CT INNER JOIN People P ON CT.Key = P.PersonId

/*
Now that the results from above are in the @Results table, you can manipulate the
rankings in one of several ways, the simplest is to pull the results ordered first by Source then by Rank.  Of course you would probably join to the People table to pull the name fields.
*/

SELECT PersonId
FROM @Results
ORDER BY Source, Rank DESC

/*
A more complex manipulation would use a statement to multiply the ranking by a value above 1 (to increase rank) or less than 1 (to lower rank), then return results based on the new rank.  This provides more fine tuning, since I could make first name 10% higher and middle name 15% lower and leave last name and also known the original value.
*/

SELECT PersonId, CASE Source WHEN 1 THEN Rank * 1.1 WHEN 2 THEN Rank * .9 ELSE Rank END AS NewRank FROM @Results
ORDER BY NewRank DESC

The one downside is you'll notice I didn't use UNION ALL, so if a word appears in more than one column, the rank won't reflect that. If that's an issue you could use UNION ALL and then remove duplicate person id's by adding all or part of the duplicate record's rank to the rank of another record with the same person id.

like image 192
Dave_H Avatar answered Sep 20 '22 16:09

Dave_H


Ranks are useless across indexes, you can't merge them and expect the result to mean anything. The rank numbers of each index are apple/orange/grape/watermelon/pair comparisions that have no relative meaning WRT contents of other indexes.

Sure you can try and link/weight/order ranks between indexes to try and fudge a meaningful result but at the end of the day that result is still gibberish however possibly still good enough to provide a workable solution depending on the specifics of your situation.

In my view the best solution is to put all data you intend to be searchable in a single FTS index/column and use that columns rank to order your output.. Even if you have to duplicate field contents to accomplish the result.

like image 24
Einstein Avatar answered Sep 21 '22 16:09

Einstein