I've got a search function for news articles that looks like this (contains more than 5 search items):
SELECT TOP 5 *
FROM NewsArticles
WHERE (headline LIKE '% sustainable %'OR
headline LIKE '% sustainability %' OR
headline LIKE '% conservation %' OR
headline LIKE '% environmental % OR
headline LIKE '% environmentally %')
OR
(body LIKE '% sustainable %'OR
body LIKE '% sustainability %' OR
body LIKE '% conservation %' OR
body LIKE '% environmental % OR
body LIKE '% environmentally %')
ORDER BY publishDate DESC
This query is designed to pull out the top 5 news stories relating to sustainability and sits on my main sustainability homepage. However, it takes a while to run and the page is slow to load. So I'm looking up ways to speed this up. Having so many LIKE clauses seems cumbersome so I've tried something with a JOIN like this:
CREATE TABLE #SearchItem (Search varchar(255))
INSERT INTO #SearchItem VALUES
('sustainable'),
('sustainability'),
('conservation'),
('environmental'),
('environmentally')
SELECT TOP 5 *
FROM NewsArticles as n
JOIN #SearchItem as s
ON n.headline COLLATE DATABASE_DEFAULT LIKE '% ' + s.Search + ' %' OR
n.body COLLATE DATABASE_DEFAULT LIKE '% ' + s.Search + ' %'
ORDER BY n.publishDate DESC
This seems to work very well for performance, but seems to sometimes bring back duplicate articles where one of the search words appears in both the body and the headline (which is often the case). I've tried using the word using 'SELECT DISTINCT TOP 5 *' but this gives me an error saying 'The ntext data type cannot be selected as DISTINCT because it is not comparable'. Is there away of stopping this from bringing back duplicates without doing 2 separate searches and using UNION?
Not only LIKE, but you can also use multiple NOT LIKE conditions in SQL. You will get records of matching/non-matching characters with the LIKE – this you can achieve by percentile (%) wildcard character. Below use cases, help you know how to use single and multiple like conditions.
A where clause will generally increase the performance of the database. Generally, it is more expensive to return data and filter in the application. The database can optimize the query, using indexes and partitions. The database may be running in parallel, executing the query in parallel.
The way to make a query run faster is to reduce the number of calculations that the software (and therefore hardware) must perform. To do this, you'll need some understanding of how SQL actually makes calculations.
If you are doing these types of searches, you should use full text search. You need to read up in BOL about how to set this up as it is complicated. However when you have a wildcard as the first character, then SQL server cannot use indexes which is why this is slow.
If there is an news Article key you can use a query to that joins back to itself such as:
select top 5 *
from NewsArticles as na
join
(
SELECT distinct idNo , publishDate
FROM NewsArticles as n
JOIN #SearchItem as s
ON n.headline COLLATE DATABASE_DEFAULT LIKE '% ' + s.Search + ' %' OR
n.body COLLATE DATABASE_DEFAULT LIKE '% ' + s.Search + ' %'
) as sk
on sk.idNo = na.idNo
ORDER BY sk.publishDate DESC
Since you get multiple hits on multiple words, you can use the selected ID's as a filter for the actual selection of the articles:
Select TOP 5 *
from NewsArticles
where ID in (SELECT ID
FROM NewsArticles as n
JOIN #SearchItem as s
ON n.headline COLLATE DATABASE_DEFAULT LIKE '% ' + s.Search + ' %' OR
n.body COLLATE DATABASE_DEFAULT LIKE '% ' + s.Search + ' %'
)
ORDER BY publishDate DESC
It should still be reasonably fast (compared to the original query) and duplicate-free.
(as in Rawheiser's response, there is an assumption that an ID field actually exists :))
You could also try Full-Text Search. Something like
SELECT TOP 5 * FROM NewsArticles
WHERE CONTAINS((headline,body), 'FORMSOF(INFLECTIONAL, sustainable) OR conservation OR FORMSOF(INFLECTIONAL, environmental)')
But, as HLGEM says, it will be important to read up on setting up FTS. Just creating an index for those two columns may be enough since you are searching for single words, but once you add phrases you want to start editing stop words and breakers and noise words.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With