I'm implementing a tag system similar to StackOverflow tag system but I just wonder How-to get related tags and define the relationships weights between tags like the list of "Related Tags" in any tag page like this https://stackoverflow.com/questions/tagged/php they define the relationship weight by the co-occurrence between 2 or more tags
How I can do this in PHP/MySQl to define the most related tags for tag "X" and keep all weights up to date as users add more and more posts/questions ?
You probably want to look into statistics for this:
As for more information on step 5: This information only changes very slowly, so you can really cache this stuff and only recreate it when you have time.
What you want in the end is a relation
conditional_probability(X, Y, P)
Which tells you how probable (P) tag Y is, given X. P was calculated in step 4.
I used this blog entry for calculating relative tag size within a cloud. You can use this algorithm on the entire could or a particular found set.
Instead of storing the denormalized weights for all tags in the database, I cache them in my (Ruby) process, and rebuild them when tags are added/removed or when the process restarts.
As for how to store them, you generally want:
Once you have that, and once you have a found set of items on a results page, it's a simple join and unique to find out the set of 'related' tags.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With