Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Slow SQL query involving CONTAINS and OR

We’re having a problem we were hoping the good folks of Stack Overflow could help us with. We’re running SQL Server 2008 R2 and are having problems with a query that takes a very long time to run on a moderate set of data , about 100000 rows. We're using CONTAINS to search through xml files and LIKE on another column to support leading wild cards.

We’ve reproduced the problem with the following small query that takes about 35 seconds to run:

SELECT something FROM table1 
WHERE (CONTAINS(TextColumn, '"WhatEver"') OR  
        DescriptionColumn LIKE '%WhatEver%')

Query plan:

Slow query

If we modify the query above to using UNION instead, the running time drops from 35 seconds to < 1 seconds. We would like to avoid using this approach to solve the issue.

SELECT something FROM table1 WHERE (CONTAINS(TextColumn, '"WhatEver"') 
UNION
(SELECT something FROM table1 WHERE (DescriptionColumn LIKE '%WhatEver%'))

Query plan:

Fast query

The column that we’re using CONTAINS to search through is a column with type image and consists of xml files sized anywhere from 1k to 20k in size.

We have no good theories as to why the first query is so slow so we were hoping someone here would have something wise to say on the matter. The query plans don’t show anything out of the ordinary as far as we can tell. We've also rebuilt the indexes and statistics.

Is there anything blatantly obvious we’re overlooking here?

Thanks in advance for your time!

like image 445
Tilfeldig Avatar asked Jun 30 '11 11:06

Tilfeldig


People also ask

What makes an SQL query slow?

The cause of performance problems can be grouped into two categories, running or waiting: WAITING: Queries can be slow because they're waiting on a bottleneck for a long time. See a detailed list of bottlenecks in types of Waits. RUNNING: Queries can be slow because they're running (executing) for a long time.

Which is faster like or contain?

This is because CONTAINS used the full text index while LIKE cannot use the index. Thus it is clear that CONTAINS is faster than LIKE.

Which is faster between or in SQL?

Between is faster due to lesser comparisons. With IN clause each elements are traversed every time. But purpose of both are different: Between is used when you are comparing with Range of values in some kind of sequence.


3 Answers

Why are you using DescriptionColumn LIKE '%WhatEver%' instead of CONTAINS(DescriptionColumn, '"WhatEver"')?

CONTAINS is obviously a Full-Text predicate and will use the SQL Server Full-Text engine to filter the search results, however LIKE is a "normal" SQL Server keyword and so SQL Server will not use the Full-Text engine to asist with this query - In this case because the LIKE term begins with a wildcard SQL Server will be unable to use any indexes to help with the query either which will most likely result in a table scan and / or poorer performance than using the Full-Text engine.

Its difficult impossible to tell without an execution plan, however my guess on whats happening would be:

  • The UNION variation of the query is performing a table scan against table1 - the table scan is not fast, however because there are relatively few rows in the table it is not performing that slowly (compared to a 35s benchmark).

  • In the OR variation of the query SQL Server is first using the Full-Text engine to filter based on the CONTAINS and then goes on to perform an RDI lookup on each matching row in the result to filter based on the LIKE predicate, however for some reason SQL Server has massively underestimated the number of rows (this can happen with certain types of predicate) and so goes on to perform several thousnad RDI lookups which ends up being incredibly slow (a table scan would have been much quicker).

To really understand whats going on you need to get a query plan.

like image 93
Justin Avatar answered Sep 23 '22 15:09

Justin


Did you guys try this:

SELECT *
FROM table
WHERE CONTAINS((column1, column2, column3), '"*keyword*"')  

Instead of this:

SELECT *
FROM table
WHERE CONTAINS(column1, '"*keyword*"') 
OR CONTAINS(column2, '"*keyword*"') 
OR CONTAINS(column3y, '"*keyword*"') 

The first one is a lot faster.

like image 33
Edwin Hebbink Avatar answered Sep 23 '22 15:09

Edwin Hebbink


I just ran into this. This is reportedly a bug on SQL server 2008 R2:

http://www.arcomit.co.uk/support/kb.aspx?kbid=000060

Your approach of using a UNION of two selects instead of an OR is the workaround they recommend in that article.

like image 36
jsur Avatar answered Sep 20 '22 15:09

jsur