Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MySQL order by the number of matches in an intermediate table

Tags:

database

mysql

So I have a query that is trying to grab "related posts".

Categories have a one-to-many relationship with posts. Tags have a many-to-many relationship. So my tables look roughly like this:

posts table:
id | category_id | ... | ...

tags table:
id | ... | ...

post_tag intermediate table:
id | post_id | tag_id | ... | ...

So if I have a single Post row already, and what to grab its "related" posts. My logic is roughly that I want to grab only posts that are in the same category, but to order those posts by the amount of tags that match the original post. So another post in the same category that has the exact same tags as the original post, should be a very high match, whereas a post that only matches 3/4 of the tags will show up lower in the results.

Here is what I have so far:

SELECT *
FROM posts AS p
WHERE p.category_id=?
ORDER BY ( SELECT COUNT(id) 
           FROM post_tag AS i 
           WHERE i.tag_id IN( ? )
         )
LIMIT 5

BINDINGS: Initial Posts Category ID; Initial Posts Tag IDs;

Clearly this is not going to actually order the results by the correct values in the sub-select. I am having trouble trying to think of how to join this to achieve the correct results.

Thanks in advance!

like image 561
Conar Welsh Avatar asked Oct 06 '12 17:10

Conar Welsh


2 Answers

If I undestood your question correctly this is what you're looking for:

SELECT p.*, 
       Count(pt.tag_id) AS ord 
FROM   posts AS currentpost 
       JOIN posts AS p 
         ON p.category_id = currentpost.category_id 
            AND p.id != currentpost.id 
       JOIN post_tag AS pt 
         ON pt.post_id = p.id 
            AND pt.tag_id IN (SELECT tag_id 
                              FROM   post_tag 
                              WHERE  post_id = currentpost.id) 
WHERE  currentpost.id = ? 
GROUP  BY p.id 
ORDER  BY ord DESC 

BINDINGS: Initial posts.id;

and you only have to specify the id of the current post in my version so you don't have to fetch the posts tags beforehand and format them suitably for an in clause

EDIT: This should be a faster query by avoiding double joining posts, if you don't like user variables just replace all currentpostid with ? and triple-bind post_id:

set @currentpostid = ?;
select p.*, count(pt.tag_id) as ord
from posts as p, 
join post_tag as pt
    on pt.post_id = p.id
    and pt.tag_id in (select tag_id from post_tag where post_id = @currentpostid)
where p.category_id = (select category_id from posts where id=@currentpostid)
    and p.id != @currentpostid
group by p.id
order by ord desc;
like image 166
xception Avatar answered Nov 03 '22 00:11

xception


Try this,

SELECT posts.* 
FROM   posts,(SELECT p.id, 
                     Count(pt.tag_id) AS count_tag 
              FROM   posts AS p, 
                     post_tag AS pt 
              WHERE  p.category_id = '***' 
                     AND pt.post_id = p.id 
                     AND pt.tag_id IN(SELECT tag_id 
                                      FROM   post_tag 
                                      WHERE  post_tag.post_id = '***') 
              GROUP  BY p.id 
              ) temp

WHERE  posts.id =temp.id ORDER  BY temp.count_tag desc

Where you can fill *** as you already have 1 post row

like image 39
Ankur Avatar answered Nov 02 '22 22:11

Ankur