Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to design a MySql Table for a Tag Cloud?

I have articles on my site, and I would like to add tags which would describe each article, but I'm having problems with design mysql table for tags. I have two ideas:

  1. each article would have field "tags", and tags would be in format: "tag1,tag2,tag3"
  2. create other table called tags with fields: tag_name, article_id

So when I want tags for article with ID 1, I would run

SELECT ... FROM tags WHERE `article_id`=1;

But, I would also like to know 3 most similar articles by comparing tags, so if I have article which has tags "php,mysql,erlang", and 5 articles with tags: "php,mysql", "erlang,ruby", "php erlang", "mysql,erlang,javascript", I would choose 1., 3. and 4., since those 3 have most same tags with main article.

Also other question, what is the best way to get 10 "most used tags" ?

like image 988
mfolnovich Avatar asked Apr 08 '10 19:04

mfolnovich


2 Answers

Generally, for this kind of many-to-many relationship, there are three tables :

  • The "article" table
    • primary key = id
  • The "tag" table
    • primary key = id
    • contains the data of each tag :
      • name, for example
  • A "tags_articles" table, which acts as a join table, and contains only :
    • id_article : foreign key that points to an article
    • id_tag : foreign key that points to a tag


This way, there is no duplication of any tag's data : for each tag, there is one, and only one, line in the tag table.

And, for each article, you can have several tags (i.e. several lines in the tags_articles table) ; and, of course, for each tags, you can have several articles.

Getting a list of tags for an article, with this idea, is a matter of an additionnal query, like :

select tag.*
from tag
    inner join tags_articles on tag.id = tags_articles.id_tag
where tags_articles.id_article = 123


Getting the three "most similar" articles would mean :

  • select articles that have tags that the first article has
  • only use those which have the most important number of identical tags

Not tested, but an idea might be something that would look like this :

select article.id, count(*) as nb_identical_tags
from article
    inner join tags_articles on tags_articles.id_article = article.id
    inner join tag on tag.id = tags_articles.id_tag
where tag.name in ('php', 'mysql', 'erlang')
      and article.id <> 123
group by article.id
order by count(*) desc
limit 3

Basically, you :

  • select the articles ids for each tag that's present on your initial article
    • as there's an inner join, if an article in the DB has 2 tags that match the where clause, without the group by clause, there would be two lines for that article
    • of course, you don't want to re-select the article you already had -- which means it has to be excluded.
  • but, as you use group by article.id, there will be only one line per article
    • but you'll be able to use count, to find out how many tags each article has in common with the initial one
  • then, it's only a matter of sorting per number of tags, and getting only the third three lines.
like image 155
Pascal MARTIN Avatar answered Oct 03 '22 15:10

Pascal MARTIN


First off, you'll want to use Pascal MARTIN's suggestion about the table design.

As for finding similar articles, here's something to get you started. Given that @article_id is the article you want to find matches for, and @tag1, @tag2, @tag3 are the tags for that article:

SELECT article_id, count(*)
FROM tags_articles
WHERE article_id <> @article_id
AND tag_id IN (@tag1, @tag2, @tag3)
GROUP BY article_id
ORDER BY count(*) DESC
LIMIT 3
like image 40
Eric Petroelje Avatar answered Oct 05 '22 15:10

Eric Petroelje