I am doing a search in two text fields called Subject
and Text
for a specific keyword. To do this I use the LIKE
statement. I have encountered a problem when trying to sort the results by the number of occurrences.
my search query looks like this:
SELECT * FROM Table WHERE (Text LIKE '%Keyword%' OR Subject LIKE '%Keyword%')
I tried to add a count() statement and sort it by the number of occurrences, but the count() statement just keep returning the number of rows in my table.
Here is the query with count statement:
SELECT *, COUNT(Text LIKE '%Keyword%') AS cnt FROM News WHERE (Text LIKE '%Keyword%' OR Subject LIKE '%Keyword%') ORDER BY cnt
What im looking for is something that returns the number of matches on the Subject and Text columns on each row, and then order the result after the highest amount of occurrences of the keyword on each row.
Below query can give you the no.of occurrences of string appears in both columns i.e text and subject and will sort results by the criteria but this will not be a good solution performance wise its better to sort the results in your application code level
SELECT *,
(LENGTH(`Text`) - LENGTH(REPLACE(`Text`, 'Keyword', ''))) / LENGTH('Keyword')
+
(LENGTH(`Subject`) - LENGTH(REPLACE(`Subject`, 'Keyword', ''))) / LENGTH('Keyword') `occurences`
FROM
`Table`
WHERE (Text LIKE '%Keyword%' OR Subject LIKE '%Keyword%')
ORDER BY `occurences` DESC
Suggested by @lserni a more cleaner way of calculation of occurrences
SELECT *,
(LENGTH(`Text`) - LENGTH(REPLACE(`Text`, 'test', ''))) / LENGTH('test') `appears_in_text`,
(LENGTH(`Subject`) - LENGTH(REPLACE(`Subject`, 'test', ''))) / LENGTH('test') `appears_in_subject`,
(LENGTH(CONCAT(`Text`,' ',`Subject`)) - LENGTH(REPLACE(CONCAT(`Text`,' ',`Subject`), 'test', ''))) / LENGTH('test') `occurences`
FROM
`Table1`
WHERE (TEXT LIKE '%test%' OR SUBJECT LIKE '%test%')
ORDER BY `occurences` DESC
You want SUM
instead. Count will count how many records have non-null values, which means ALL matches and NON-matches will be counted.
SELECT *, SUM(Text LIKE '%Keyword') AS total_matches
...
ORDER BY total_matches
SUM() will count up how many boolean true results the LIKE produces, which will be typecast to integers, so you get a result like 1+1+1+0+1 = 4, instead of the 5 non-nulls count.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With