Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How-to build tagging system like stackoverflow

I'm implementing a tag system similar to StackOverflow tag system but I just wonder How-to get related tags and define the relationships weights between tags like the list of "Related Tags" in any tag page like this https://stackoverflow.com/questions/tagged/php they define the relationship weight by the co-occurrence between 2 or more tags

How I can do this in PHP/MySQl to define the most related tags for tag "X" and keep all weights up to date as users add more and more posts/questions ?

like image 301
Zamblek Avatar asked Nov 17 '10 07:11

Zamblek


2 Answers

You probably want to look into statistics for this:

  1. given a tag X
  2. check all other tags Y
  3. count how often Y and X show up at the same time
  4. divide by how often Y shows up
  5. ???
  6. Profit!!!

As for more information on step 5: This information only changes very slowly, so you can really cache this stuff and only recreate it when you have time.

What you want in the end is a relation

conditional_probability(X, Y, P)

Which tells you how probable (P) tag Y is, given X. P was calculated in step 4.

like image 186
Daren Thomas Avatar answered Sep 17 '22 11:09

Daren Thomas


I used this blog entry for calculating relative tag size within a cloud. You can use this algorithm on the entire could or a particular found set.

Instead of storing the denormalized weights for all tags in the database, I cache them in my (Ruby) process, and rebuild them when tags are added/removed or when the process restarts.

As for how to store them, you generally want:

  1. A tags table associating unique tag names with row IDs, and
  2. A tags_items table providing you with your n-to-n mapping between tags and items.

Once you have that, and once you have a found set of items on a results page, it's a simple join and unique to find out the set of 'related' tags.

like image 31
Phrogz Avatar answered Sep 17 '22 11:09

Phrogz