Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL Server 2005 Full Text Search over multiple tables and columns

I'm looking for a good solution to use the containstable feature of the SQL Serve r2005 effectivly. Currently I have, e.g. an Employee and an Address table.

-Employee
Id
Name

-Address
Id
Street
City
EmployeeId

Now the user can enter search terms in only one textbox and I want this terms to be split and search with an "AND" operator. FREETEXTTABLE seems to work with "OR" automatically.

Now lets say the user entered "John Hamburg". This means he wants to find John in Hamburg. So this is "John AND Hamburg".

So the following will contain no results since CONTAINSTABLE checks every column for "John AND Hamburg".

So my question is: What is the best way to perform a fulltext search with AND operators across multiple columns/tables?

SELECT *
FROM Employee emp
    INNER JOIN 
        CONTAINSTABLE(Employee, *, '(JOHN  AND Hamburg)', 1000) AS keyTblSp
        ON sp.ServiceProviderId = keyTblSp.[KEY]    
    LEFT OUTER JOIN [Address] addr ON addr.EmployeeId = emp.EmployeeId
UNION ALL
SELECT *
FROM Employee emp 
    LEFT OUTER JOIN [Address] addr ON addr.EmployeeId = emp.EmployeeId
    INNER JOIN 
        CONTAINSTABLE([Address], *, '(JOHN  AND Hamburg)', 1000) AS keyTblAddr
        ON addr.AddressId = keyTblAddr.[KEY]    

...
like image 942
Chris Avatar asked Feb 28 '23 19:02

Chris


2 Answers

This is more of a syntax problem. How do you divine the user's intent with just one input box?

  • Are they looking for "John Hamburg" the person?
  • Are they looking for "John Hamburg Street"?
  • Are they looking for "John" who lives on "Hamburg Street" in Springfield?
  • Are they looking for "John" who lives in the city of "Hamburg"?

Without knowing the user's intent, the best you can hope for is to OR the terms, and take the highest ranking hits.

Otherwise, you need to program in a ton of logic, depending on the number of words passed in:

2 words:

Search Employee data for term 1, Search Employee data for term 2, Search Address data for term 1, Search address data for term 2. Merge results by term, order by most hits.

3 words:

Search Employee data for term 1, Search Employee data for term 2, Search employee data for term 3, Search Address data for term 1, Search address data for term 2, Search address data for term 3. Merge results by term, order by most hits.

etc...

I guess I would redesign the GUI to separate the input into Name and Address, at a minimum. If that is not possible, enforce a syntax rule to the effect "First words will be considered a name until a comma appears, any words after that will be considered addresses"

EDIT:

Your best bet is still OR the terms, and take the highest ranking hits. Here's an example of that, and an example why this is not ideal without some pre-processing of the input to divine the user's intent:

insert into Employee (id, [name]) values (1, 'John Hamburg')
insert into Employee (id, [name]) values (2, 'John Smith')
insert into Employee (id, [name]) values (3, 'Bob Hamburg')
insert into Employee (id, [name]) values (4, 'Bob Smith')
insert into Employee (id, [name]) values (5, 'John Doe')

insert into Address (id, street, city, employeeid) values (1, 'Main St.', 'Springville', 1)
insert into Address (id, street, city, employeeid) values (2, 'Hamburg St.', 'Springville', 2)
insert into Address (id, street, city, employeeid) values (3, 'St. John Ave.', 'Springville', 3)
insert into Address (id, street, city, employeeid) values (4, '5th Ave.', 'Hamburg', 4)
insert into Address (id, street, city, employeeid) values (5, 'Oak Lane', 'Hamburg', 5)

Now since we don't know what keywords will apply to what table, we have to assume they could apply to either table, so we have to OR the terms against each table, UNION the results, Aggregate them, and compute the highest rank.

SELECT Id, [Name], Street, City, SUM([Rank])
FROM
(
    SELECT emp.Id, [Name], Street, City, [Rank]
    FROM Employee emp 
    JOIN [Address] addr ON emp.Id = addr.EmployeeId
    JOIN CONTAINSTABLE(Employee, *, 'JOHN OR Hamburg') AS keyTblEmp ON emp.Id = keyTblEmp.[KEY]

    UNION ALL

    SELECT emp.Id, [Name], Street, City, [Rank]
    FROM Employee emp 
    JOIN [Address] addr ON emp.Id = addr.EmployeeId
    JOIN CONTAINSTABLE([Address], *, 'JOHN OR Hamburg') AS keyTblAdd ON addr.Id = keyTblAdd.[KEY]   
) as tmp

GROUP BY Id, [Name], Street, City
ORDER BY SUM([Rank]) DESC

This is less than ideal, here's what you get for the example (in your case, you would have wanted John Doe from Hamburg to show up first):

Id       Name              Street            City           Rank
2        John Smith        Hamburg St.       Springville    112
3        Bob Hamburg       St. John Ave.     Springville    112
5        John Doe          Oak Lane          Hamburg        96
1        John Hamburg      Main St.          Springville    48
4        Bob Smith         5th Ave.          Hamburg        48

But that is the best you can do without parsing the input before submitting it to SQL to make a "best guess" at what the user wants.

like image 184
GalacticJello Avatar answered Mar 02 '23 13:03

GalacticJello


I had the same problem. Here is my solution, which worked for my case:

I created a view that returns the columns that I want. I added another extra column which aggregates all the columns I want to search among. So, in this case the view would be like

SELECT emp.*, addr.*, ISNULL(emp.Name,'') + ' ' + ISNULL(addr.City, '') AS SearchResult 
FROM Employee emp 
    LEFT OUTER JOIN [Address] addr ON addr.EmployeeId = emp.EmployeeId

After this I created a full-text index on SearchResult column. Then, I search on this column

SELECT *
FROM vEmpAddr ea
INNER JOIN CONTAINSTABLE(vEmpAddr, *, 'John AND Hamburg') a ON ea.ID = a.[Key]
like image 40
Thea Avatar answered Mar 02 '23 12:03

Thea