Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

T-SQL - How can I make a SELECT query with multiple LIKE clauses quicker?

I've got a search function for news articles that looks like this (contains more than 5 search items):

SELECT TOP 5 *
FROM NewsArticles
WHERE (headline LIKE '% sustainable %'OR
headline LIKE '% sustainability %' OR
headline LIKE '% conservation %' OR
headline LIKE '% environmental % OR
headline LIKE '% environmentally %') 
OR
(body LIKE '% sustainable %'OR
body LIKE '% sustainability %' OR
body LIKE '% conservation %' OR
body LIKE '% environmental % OR
body LIKE '% environmentally %')
ORDER BY publishDate DESC 

This query is designed to pull out the top 5 news stories relating to sustainability and sits on my main sustainability homepage. However, it takes a while to run and the page is slow to load. So I'm looking up ways to speed this up. Having so many LIKE clauses seems cumbersome so I've tried something with a JOIN like this:

CREATE TABLE #SearchItem (Search varchar(255))

INSERT INTO #SearchItem VALUES
('sustainable'),
('sustainability'),
('conservation'),
('environmental'),
('environmentally')

SELECT TOP 5 *
FROM NewsArticles as n
JOIN #SearchItem as s
ON n.headline COLLATE DATABASE_DEFAULT LIKE '% ' + s.Search + ' %' OR 
n.body COLLATE DATABASE_DEFAULT LIKE '% ' + s.Search + ' %'
ORDER BY n.publishDate DESC

This seems to work very well for performance, but seems to sometimes bring back duplicate articles where one of the search words appears in both the body and the headline (which is often the case). I've tried using the word using 'SELECT DISTINCT TOP 5 *' but this gives me an error saying 'The ntext data type cannot be selected as DISTINCT because it is not comparable'. Is there away of stopping this from bringing back duplicates without doing 2 separate searches and using UNION?

like image 831
sr28 Avatar asked Aug 15 '13 16:08

sr28


People also ask

How Use multiple like in SQL query?

Not only LIKE, but you can also use multiple NOT LIKE conditions in SQL. You will get records of matching/non-matching characters with the LIKE – this you can achieve by percentile (%) wildcard character. Below use cases, help you know how to use single and multiple like conditions.

Do more WHERE clauses speed up queries?

A where clause will generally increase the performance of the database. Generally, it is more expensive to return data and filter in the application. The database can optimize the query, using indexes and partitions. The database may be running in parallel, executing the query in parallel.

How do you increase the speed of a query?

The way to make a query run faster is to reduce the number of calculations that the software (and therefore hardware) must perform. To do this, you'll need some understanding of how SQL actually makes calculations.


4 Answers

If you are doing these types of searches, you should use full text search. You need to read up in BOL about how to set this up as it is complicated. However when you have a wildcard as the first character, then SQL server cannot use indexes which is why this is slow.

like image 94
HLGEM Avatar answered Nov 15 '22 09:11

HLGEM


If there is an news Article key you can use a query to that joins back to itself such as:

select top 5 *
from NewsArticles as na
join 
    ( 
        SELECT distinct idNo , publishDate 
        FROM NewsArticles as n
        JOIN #SearchItem as s
        ON n.headline COLLATE DATABASE_DEFAULT LIKE '% ' + s.Search + ' %' OR 
        n.body COLLATE DATABASE_DEFAULT LIKE '% ' + s.Search + ' %'
    ) as sk
        on sk.idNo = na.idNo
ORDER BY sk.publishDate DESC
like image 34
Rawheiser Avatar answered Nov 15 '22 09:11

Rawheiser


Since you get multiple hits on multiple words, you can use the selected ID's as a filter for the actual selection of the articles:

Select TOP 5 * 
from NewsArticles 
where ID in (SELECT ID
    FROM NewsArticles as n
    JOIN #SearchItem as s
    ON n.headline COLLATE DATABASE_DEFAULT LIKE '% ' + s.Search + ' %' OR 
       n.body COLLATE DATABASE_DEFAULT LIKE '% ' + s.Search + ' %'
)
ORDER BY publishDate DESC

It should still be reasonably fast (compared to the original query) and duplicate-free.

(as in Rawheiser's response, there is an assumption that an ID field actually exists :))

like image 43
SWeko Avatar answered Nov 15 '22 10:11

SWeko


You could also try Full-Text Search. Something like

SELECT TOP 5 * FROM NewsArticles 
WHERE CONTAINS((headline,body), 'FORMSOF(INFLECTIONAL, sustainable) OR conservation OR FORMSOF(INFLECTIONAL, environmental)')

But, as HLGEM says, it will be important to read up on setting up FTS. Just creating an index for those two columns may be enough since you are searching for single words, but once you add phrases you want to start editing stop words and breakers and noise words.

like image 27
randcd Avatar answered Nov 15 '22 09:11

randcd