I'm using MYSQL to generate a score for each result returned by a query. The results are then ordered by the score.
The part that doesn't seem to be working properly is when I'm trying to add a score for each tag that has been searched and the result is assigned to. So lets say I do a search for the tags "example", "test and "tag" and one of my results is assigned to the tags "example", "test", "someothertag" it should come up with a score of 10 since there are 2 matches.
What is actually happening is I'm getting a score of 5 if there is a match, regardless of how many tags are matched. and 0 if no tags are matched.
Here is an example of one of the queries that is generated from a search.
SELECT DISTINCT results.*,
(
5*(MATCH(tags.name) AGAINST('"self employed"' IN BOOLEAN MODE)) +
5*(MATCH(tags.name) AGAINST('"rental income"' IN BOOLEAN MODE)) +
5*(MATCH(tags.name) AGAINST('"commission income"' IN BOOLEAN MODE)) +
5*(MATCH(tags.name) AGAINST('"bankruptcy"' IN BOOLEAN MODE)) +
5*(MATCH(tags.name) AGAINST('"condo approval"' IN BOOLEAN MODE)) +
1*usefulness +
10*shares
) AS score
FROM results
INNER JOIN categories c on results.ID = c.RESULT_ID
INNER JOIN tags ON results.id = tags.result_id
WHERE c.name in ('purchase', 'condo', 'va')
AND ( tags.name = 'self employed' OR tags.name = 'rental income' OR tags.name = 'commission income' OR tags.name = 'bankruptcy' OR tags.name = 'condo approval' )
AND ( results.scope = 'all' OR results.scope = 'hi' )
AND published = 1
GROUP BY results.ID
having count(distinct c.c_id) = 3
ORDER BY score DESC
LIMIT 8 OFFSET 0
As advised by Sam Dufel, you probably do not need full text search, especially since you are using exact string comparison in your WHERE
clauses.
Moreover, because of the many-to-many relationship between results
and categories
(assumed from the HAVING COUNT(c_id) = 3
clause), I think in no way can you join both categories
and tags
in the same query.
Without the GROUP BY
clause, for one given result
, you would get one row for each matching category
. For each matching pair (result
, category
), you would then get one row for each matching tag.name
. I don't think there is a way to deal with such a result.
What I would suggest is:
Step 1: getting results
present in all three categories
SELECT results.ID
FROM results
JOIN categories ON results.id = categories.result_id
WHERE categories.name IN ('purchase', 'condo', 'va')
GROUP BY results.ID
HAVING COUNT(DISTINCT c.c_id) = 3
Step 2: computing score of any results
matching at least one search string
SELECT
DISTINCT results.*, -- DISTINCT is redundant because of the GROUP BY clause
(
5*(COUNT(tags.result_id)) + -- you actually want to count the number of matches!
1*usefulness + -- warning, see below
10*shares -- warning, see below
) AS score
FROM results
INNER JOIN tags ON results.id = tags.result_id
WHERE
tags.name = 'self employed'
OR tags.name = 'rental income'
OR tags.name = 'commission income'
OR tags.name = 'bankruptcy'
OR tags.name = 'condo approval'
GROUP BY results.ID
Step 3: putting it all together
SELECT
results.*,
(
5*(COUNT(tags.result_id)) +
1*usefulness + -- warning, see below
10*shares -- warning, see below
) AS score
FROM (
SELECT results.id
FROM results
JOIN categories ON results.id = categories.result_id
WHERE
categories.name IN ('purchase', 'condo', 'va')
AND ( results.scope = 'all' OR results.scope = 'hi' )
AND published = 1
GROUP BY results.id
HAVING COUNT(DISTINCT categories.c_id) = 3
) AS results_subset
JOIN results ON results_subset.id = results.id
JOIN tags ON results.id = tags.result_id
WHERE
tags.name = 'self employed'
OR tags.name = 'rental income'
OR tags.name = 'commission income'
OR tags.name = 'bankruptcy'
OR tags.name = 'condo approval'
GROUP BY results.ID
Notice where I chose to include the conditions WHERE on scope
and published
. This choice is based on the principle that filters should be stated as early as possible. You may get better performance if you place them in the outer query but it really depends on cardinalities.
A word of warning: fields usefulness
and shares
are neither part of the GROUP BY
function not included in an aggregation function. This is allowed by MySQL but highly dangerous. If usefulness
and shares
belong to a table other than result
(the table being GROUP'ed BY), the values returned in your query are undefined.
write it as follows:
"sum((5*(MATCH(tags.name) AGAINST('"self employed"' IN BOOLEAN MODE))),
(5*(MATCH(tags.name) AGAINST('"rental income"' IN BOOLEAN MODE))) ,
(5*(MATCH(tags.name) AGAINST('"commission income"' IN BOOLEAN MODE))),
(5*(MATCH(tags.name) AGAINST('"bankruptcy"' IN BOOLEAN MODE))),
(5*(MATCH(tags.name) AGAINST('"condo approval"' IN BOOLEAN MODE))),
(1*usefulness), (10*shares)) as score"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With