Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to check against all joins when generating a score using MYSQL

I'm using MYSQL to generate a score for each result returned by a query. The results are then ordered by the score.

The part that doesn't seem to be working properly is when I'm trying to add a score for each tag that has been searched and the result is assigned to. So lets say I do a search for the tags "example", "test and "tag" and one of my results is assigned to the tags "example", "test", "someothertag" it should come up with a score of 10 since there are 2 matches.

What is actually happening is I'm getting a score of 5 if there is a match, regardless of how many tags are matched. and 0 if no tags are matched.

Here is an example of one of the queries that is generated from a search.

        SELECT DISTINCT results.*,
                    ( 
                        5*(MATCH(tags.name) AGAINST('"self employed"' IN BOOLEAN MODE)) +
            5*(MATCH(tags.name) AGAINST('"rental income"' IN BOOLEAN MODE)) +
            5*(MATCH(tags.name) AGAINST('"commission income"' IN BOOLEAN MODE)) +
            5*(MATCH(tags.name) AGAINST('"bankruptcy"' IN BOOLEAN MODE)) +
            5*(MATCH(tags.name) AGAINST('"condo approval"' IN BOOLEAN MODE)) +

                        1*usefulness + 
                        10*shares 
                    ) AS score 
        FROM results
        INNER JOIN categories c on results.ID = c.RESULT_ID
        INNER JOIN tags ON results.id = tags.result_id
        WHERE c.name in ('purchase', 'condo', 'va')
        AND ( tags.name = 'self employed' OR tags.name = 'rental income' OR tags.name = 'commission income' OR tags.name = 'bankruptcy' OR tags.name = 'condo approval'  )
        AND ( results.scope = 'all' OR results.scope = 'hi' )
        AND published = 1

        GROUP BY results.ID
        having count(distinct c.c_id) = 3
        ORDER BY score DESC 
        LIMIT 8 OFFSET 0
like image 282
Martin Hunt Avatar asked Apr 30 '13 16:04

Martin Hunt


2 Answers

As advised by Sam Dufel, you probably do not need full text search, especially since you are using exact string comparison in your WHERE clauses.

Moreover, because of the many-to-many relationship between results and categories (assumed from the HAVING COUNT(c_id) = 3 clause), I think in no way can you join both categories and tags in the same query.

Without the GROUP BY clause, for one given result, you would get one row for each matching category. For each matching pair (result, category), you would then get one row for each matching tag.name. I don't think there is a way to deal with such a result.

What I would suggest is:

Step 1: getting results present in all three categories

SELECT results.ID
FROM results
JOIN categories ON results.id = categories.result_id
WHERE categories.name IN ('purchase', 'condo', 'va')
GROUP BY results.ID
HAVING COUNT(DISTINCT c.c_id) = 3

Step 2: computing score of any results matching at least one search string

SELECT
    DISTINCT results.*, -- DISTINCT is redundant because of the GROUP BY clause
    ( 
        5*(COUNT(tags.result_id)) + -- you actually want to count the number of matches!
        1*usefulness +  -- warning, see below 
        10*shares       -- warning, see below
    ) AS score 
FROM results
INNER JOIN tags ON results.id = tags.result_id
WHERE
    tags.name = 'self employed'
    OR tags.name = 'rental income'
    OR tags.name = 'commission income'
    OR tags.name = 'bankruptcy'
    OR tags.name = 'condo approval'
GROUP BY results.ID

Step 3: putting it all together

SELECT
    results.*,
    ( 
        5*(COUNT(tags.result_id)) +
        1*usefulness +  -- warning, see below 
        10*shares       -- warning, see below
    ) AS score 
FROM (
        SELECT results.id
        FROM results
        JOIN categories ON results.id = categories.result_id
        WHERE
            categories.name IN ('purchase', 'condo', 'va')
            AND ( results.scope = 'all' OR results.scope = 'hi' )
            AND published = 1
        GROUP BY results.id
        HAVING COUNT(DISTINCT categories.c_id) = 3
) AS results_subset
JOIN results ON results_subset.id = results.id
JOIN tags ON results.id = tags.result_id
WHERE
    tags.name = 'self employed'
    OR tags.name = 'rental income'
    OR tags.name = 'commission income'
    OR tags.name = 'bankruptcy'
    OR tags.name = 'condo approval'
GROUP BY results.ID

Notice where I chose to include the conditions WHERE on scope and published. This choice is based on the principle that filters should be stated as early as possible. You may get better performance if you place them in the outer query but it really depends on cardinalities.

A word of warning: fields usefulness and shares are neither part of the GROUP BY function not included in an aggregation function. This is allowed by MySQL but highly dangerous. If usefulness and shares belong to a table other than result (the table being GROUP'ed BY), the values returned in your query are undefined.

like image 71
RandomSeed Avatar answered Sep 30 '22 08:09

RandomSeed


write it as follows:

   "sum((5*(MATCH(tags.name) AGAINST('"self employed"' IN BOOLEAN MODE))), 
        (5*(MATCH(tags.name) AGAINST('"rental income"' IN BOOLEAN MODE))) ,
        (5*(MATCH(tags.name) AGAINST('"commission income"' IN BOOLEAN MODE))),
        (5*(MATCH(tags.name) AGAINST('"bankruptcy"' IN BOOLEAN MODE))),
        (5*(MATCH(tags.name) AGAINST('"condo approval"' IN BOOLEAN MODE))),
      (1*usefulness), (10*shares)) as score"
like image 39
ayan sil Avatar answered Sep 30 '22 08:09

ayan sil