Consider these 3 table structures. Which will perform these queries the best.
Article
-------
ArticleID int
Article_Tag
------------
ArticleTagID int
ArticleID int
TagID int
Tag
---
TagID int
TagText varchar(50)
Article
-------
articleID int
Article_Tag
-----------
articleTagID int
articleID int
tag varchar(50)
Article
-------
ArticleID int
Article_Tag
------------
ArticleTagID int
ArticleID int
Tag varchar(50)
Tag
---
Tag varchar(50)
Sample queries:
Select articleID from Article a inner join Article_tag at on a.articleID = at.articleID and tag = 'apple'
Select tag from Tags -- or, for structure 2
Select distinct tag from article_tag
It depends if you ever want to be able to change the tag-text globally. You could, of course, issue a wide UPDATE
on Article_Tag
, but if you need to be able to do this, then being able to just update the value in Tag
would be simpler. Some servers offer automatic updates (such as ON UPDATE CASCADE
in SQL Server), but these aren't necessarily cheap (it still has to UPDATE
a lot of rows and any indexes defined).
But if you don't need this, then it should be a bit quicker with the literal in Article_Tag
, as it can remove a join - a lot of times. Obviously, index it etc.
The additional space required for the repeated literal is a factor, but disk space is usually cheaper than a faster server.
As for being a primary key; unless you have other data to store, why would you even need the table any more? You could use DISTINCT
on Article_Tag
just as easily, especially as Tag
should be indexed (so this should be pretty cheap). (edit Bill Karwin correctly points out the merits of being able to keep eligible tags, not just current tags).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With