How can I compare a group of tags to another post's tags in my database to get related posts?
What I'm trying to do is compare a group of tags on a post to another post's tags, but not each tag individually. So say you wanted to get truly related items based on tags from a post and then show them from the most related to the least related. Each time there have to be three related items shown, no matter the relationship level.
Post A has the tags: "architecture", "wood", "modern", "switzerland"
Post B has the tags: "architecture", "wood", "modern"
Post C has the tags: "architecture", "modern", "stone"
Post D has the tags: "architecture", "house", "residence"Post B is related to post A by 75% (3 related tags)
Post C is related to post A by 50% (2 related tags)
Post D is related to post A by 25% (1 related tag)
How can I do that? I'm currently using a 3-tables.
posts
> id
> image
> date
post_tags
> post_id
> tag_id
tags
> id
> name
I have searched the Internet and Stack Overflow to find out how to do this. My closest find was How to find "related items" in PHP, but it actually didn't solve much for me.
NOTE: This solution is MySQL only, as MySQL has its own interpretation of GROUP BY
I've also used my own calculation of similarity. I've taken the number of identical tags and divided it by the average tag count in post A and post B. So if post A has 4 tags, and post B has 2 tags which are both shared with A, the similarity is 66%.
(SHARED:2 / ((A:4 + B:2)/2)
or (SHARED:2) / (AVG:3)
It should be easy to change the formula if you want/need to...
SELECT
sourcePost.id,
targetPost.id,
/* COUNT NUMBER OF IDENTICAL TAGS */
/* REF GROUPING OF sourcePost.id and targetPost.id BELOW */
COUNT(targetPost.id) /
(
(
/* TOTAL TAGS IN SOURCE POST */
(SELECT COUNT(*) FROM post_tags WHERE post_id = sourcePost.id)
+
/* TOTAL TAGS IN TARGET POST */
(SELECT COUNT(*) FROM post_tags WHERE post_id = targetPost.id)
) / 2 /* AVERAGE TAGS IN SOURCE + TARGET */
) as similarity
FROM
posts sourcePost
LEFT JOIN
post_tags sourcePostTags ON (sourcePost.id = sourcePostTags.post_id)
INNER JOIN
post_tags targetPostTags ON (sourcePostTags.tag_id = targetPostTags.tag_id
AND
sourcePostTags.post_id != targetPostTags.post_id)
LEFT JOIN
posts targetPost ON (targetPostTags.post_id = targetPost.id)
GROUP BY
sourcePost.id, targetPost.id
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With