I was wondering what the best way is to implement a tag system, like the one used on SO. I was thinking of this but I can't come up with a good scalable solution. I was thinking of having a basic 3 table solution: having a <code>tags</code> table, an <code>articles</code> tables and a <code>tag_to_articles</code> table. Is this the best solution to this problem, or are there alternatives? Using this method the table would get extremely large in time, and for searching this is not too efficient I assume. On the other hand it is not that important that the query executes fast.

I believe you'll find interesting this blog post: Tags: Database schemas <blockquote> The Problem: You want to have a database schema where you can tag a bookmark (or a blog post or whatever) with as many tags as you want. Later then, you want to run queries to constrain the bookmarks to a union or intersection of tags. You also want to exclude (say: minus) some tags from the search result. </blockquote> <h3>“MySQLicious” solution</h3> In this solution, the schema has got just one table, it is denormalized. This type is called “MySQLicious solution” because MySQLicious imports del.icio.us data into a table with this structure. <img src="https://i.stack.imgur.com/qeLsx.png" alt="enter image description here"><img src="https://i.stack.imgur.com/6RINy.png" alt="enter image description here"> Intersection (AND) Query for “search+webservice+semweb”: <pre class="prettyprint"><code>SELECT * FROM `delicious` WHERE tags LIKE "%search%" AND tags LIKE "%webservice%" AND tags LIKE "%semweb%" </code></pre> Union (OR) Query for “search|webservice|semweb”: <pre class="prettyprint"><code>SELECT * FROM `delicious` WHERE tags LIKE "%search%" OR tags LIKE "%webservice%" OR tags LIKE "%semweb%" </code></pre> Minus Query for “search+webservice-semweb” <pre class="prettyprint"><code>SELECT * FROM `delicious` WHERE tags LIKE "%search%" AND tags LIKE "%webservice%" AND tags NOT LIKE "%semweb%" </code></pre> <hr> <h3>“Scuttle” solution</h3> Scuttle organizes its data in two tables. That table “scCategories” is the “tag”-table and has got a foreign key to the “bookmark”-table. <img src="https://i.stack.imgur.com/sdPKq.png" alt="enter image description here"> Intersection (AND) Query for “bookmark+webservice+semweb”: <pre class="prettyprint"><code>SELECT b.* FROM scBookmarks b, scCategories c WHERE c.bId = b.bId AND (c.category IN ('bookmark', 'webservice', 'semweb')) GROUP BY b.bId HAVING COUNT( b.bId )=3 </code></pre> First, all bookmark-tag combinations are searched, where the tag is “bookmark”, “webservice” or “semweb” (c.category IN ('bookmark', 'webservice', 'semweb')), then just the bookmarks that have got all three tags searched for are taken into account (HAVING COUNT(b.bId)=3). Union (OR) Query for “bookmark|webservice|semweb”: Just leave out the HAVING clause and you have union: <pre class="prettyprint"><code>SELECT b.* FROM scBookmarks b, scCategories c WHERE c.bId = b.bId AND (c.category IN ('bookmark', 'webservice', 'semweb')) GROUP BY b.bId </code></pre> Minus (Exclusion) Query for “bookmark+webservice-semweb”, that is: bookmark AND webservice AND NOT semweb. <pre class="prettyprint"><code>SELECT b. * FROM scBookmarks b, scCategories c WHERE b.bId = c.bId AND (c.category IN ('bookmark', 'webservice')) AND b.bId NOT IN (SELECT b.bId FROM scBookmarks b, scCategories c WHERE b.bId = c.bId AND c.category = 'semweb') GROUP BY b.bId HAVING COUNT( b.bId ) =2 </code></pre> Leaving out the HAVING COUNT leads to the Query for “bookmark|webservice-semweb”. <hr> <h3>“Toxi” solution</h3> Toxi came up with a three-table structure. Via the table “tagmap” the bookmarks and the tags are n-to-m related. Each tag can be used together with different bookmarks and vice versa. This DB-schema is also used by wordpress. The queries are quite the same as in the “scuttle” solution. <img src="https://i.stack.imgur.com/GdJC4.png" alt="enter image description here"> Intersection (AND) Query for “bookmark+webservice+semweb” <pre class="prettyprint"><code>SELECT b.* FROM tagmap bt, bookmark b, tag t WHERE bt.tag_id = t.tag_id AND (t.name IN ('bookmark', 'webservice', 'semweb')) AND b.id = bt.bookmark_id GROUP BY b.id HAVING COUNT( b.id )=3 </code></pre> Union (OR) Query for “bookmark|webservice|semweb” <pre class="prettyprint"><code>SELECT b.* FROM tagmap bt, bookmark b, tag t WHERE bt.tag_id = t.tag_id AND (t.name IN ('bookmark', 'webservice', 'semweb')) AND b.id = bt.bookmark_id GROUP BY b.id </code></pre> Minus (Exclusion) Query for “bookmark+webservice-semweb”, that is: bookmark AND webservice AND NOT semweb. <pre class="prettyprint"><code>SELECT b. * FROM bookmark b, tagmap bt, tag t WHERE b.id = bt.bookmark_id AND bt.tag_id = t.tag_id AND (t.name IN ('Programming', 'Algorithms')) AND b.id NOT IN (SELECT b.id FROM bookmark b, tagmap bt, tag t WHERE b.id = bt.bookmark_id AND bt.tag_id = t.tag_id AND t.name = 'Python') GROUP BY b.id HAVING COUNT( b.id ) =2 </code></pre> Leaving out the HAVING COUNT leads to the Query for “bookmark|webservice-semweb”.

How to implement tag system

Tags:

algorithm

system

tagging

I was wondering what the best way is to implement a tag system, like the one used on SO. I was thinking of this but I can't come up with a good scalable solution.

I was thinking of having a basic 3 table solution: having a tags table, an articles tables and a tag_to_articles table.

Is this the best solution to this problem, or are there alternatives? Using this method the table would get extremely large in time, and for searching this is not too efficient I assume. On the other hand it is not that important that the query executes fast.

577

asked Nov 27 '09 19:11

Saif Bechan

1 Answers

I believe you'll find interesting this blog post: Tags: Database schemas

The Problem: You want to have a database schema where you can tag a bookmark (or a blog post or whatever) with as many tags as you want. Later then, you want to run queries to constrain the bookmarks to a union or intersection of tags. You also want to exclude (say: minus) some tags from the search result.

“MySQLicious” solution

In this solution, the schema has got just one table, it is denormalized. This type is called “MySQLicious solution” because MySQLicious imports del.icio.us data into a table with this structure.

enter image description here

Intersection (AND) Query for “search+webservice+semweb”:

SELECT * FROM `delicious` WHERE tags LIKE "%search%" AND tags LIKE "%webservice%" AND tags LIKE "%semweb%"

Union (OR) Query for “search|webservice|semweb”:

SELECT * FROM `delicious` WHERE tags LIKE "%search%" OR tags LIKE "%webservice%" OR tags LIKE "%semweb%"

Minus Query for “search+webservice-semweb”

SELECT * FROM `delicious` WHERE tags LIKE "%search%" AND tags LIKE "%webservice%" AND tags NOT LIKE "%semweb%"

“Scuttle” solution

Scuttle organizes its data in two tables. That table “scCategories” is the “tag”-table and has got a foreign key to the “bookmark”-table.

enter image description here

Intersection (AND) Query for “bookmark+webservice+semweb”:

SELECT b.* FROM scBookmarks b, scCategories c WHERE c.bId = b.bId AND (c.category IN ('bookmark', 'webservice', 'semweb')) GROUP BY b.bId HAVING COUNT( b.bId )=3

First, all bookmark-tag combinations are searched, where the tag is “bookmark”, “webservice” or “semweb” (c.category IN ('bookmark', 'webservice', 'semweb')), then just the bookmarks that have got all three tags searched for are taken into account (HAVING COUNT(b.bId)=3).

Union (OR) Query for “bookmark|webservice|semweb”: Just leave out the HAVING clause and you have union:

SELECT b.* FROM scBookmarks b, scCategories c WHERE c.bId = b.bId AND (c.category IN ('bookmark', 'webservice', 'semweb')) GROUP BY b.bId

Minus (Exclusion) Query for “bookmark+webservice-semweb”, that is: bookmark AND webservice AND NOT semweb.

SELECT b. * FROM scBookmarks b, scCategories c WHERE b.bId = c.bId AND (c.category IN ('bookmark', 'webservice')) AND b.bId NOT IN (SELECT b.bId FROM scBookmarks b, scCategories c WHERE b.bId = c.bId AND c.category = 'semweb') GROUP BY b.bId HAVING COUNT( b.bId ) =2

Leaving out the HAVING COUNT leads to the Query for “bookmark|webservice-semweb”.

“Toxi” solution

Toxi came up with a three-table structure. Via the table “tagmap” the bookmarks and the tags are n-to-m related. Each tag can be used together with different bookmarks and vice versa. This DB-schema is also used by wordpress. The queries are quite the same as in the “scuttle” solution.

enter image description here

Intersection (AND) Query for “bookmark+webservice+semweb”

SELECT b.* FROM tagmap bt, bookmark b, tag t WHERE bt.tag_id = t.tag_id AND (t.name IN ('bookmark', 'webservice', 'semweb')) AND b.id = bt.bookmark_id GROUP BY b.id HAVING COUNT( b.id )=3

Union (OR) Query for “bookmark|webservice|semweb”

SELECT b.* FROM tagmap bt, bookmark b, tag t WHERE bt.tag_id = t.tag_id AND (t.name IN ('bookmark', 'webservice', 'semweb')) AND b.id = bt.bookmark_id GROUP BY b.id

Minus (Exclusion) Query for “bookmark+webservice-semweb”, that is: bookmark AND webservice AND NOT semweb.

SELECT b. * FROM bookmark b, tagmap bt, tag t WHERE b.id = bt.bookmark_id AND bt.tag_id = t.tag_id AND (t.name IN ('Programming', 'Algorithms')) AND b.id NOT IN (SELECT b.id FROM bookmark b, tagmap bt, tag t WHERE b.id = bt.bookmark_id AND bt.tag_id = t.tag_id AND t.name = 'Python') GROUP BY b.id HAVING COUNT( b.id ) =2

Leaving out the HAVING COUNT leads to the Query for “bookmark|webservice-semweb”.

154

answered Oct 10 '22 19:10

Nick Dandoulakis

Related questions
                            
                                What is the difference between LR(0) and SLR parsing?
                            
                                Find the Smallest Integer Not in a List
                            
                                How can I find the shortest path between 100 moving targets? (Live demo included.)
                            
                                How can Google be so fast?
                            
                                What is O(log* N)?
                            
                                How do I check if a directed graph is acyclic?
                            
                                What is amortized analysis of algorithms? [closed]
                            
                                Efficient way to search an element
                            
                                JavaScript: Calculate the nth root of a number
                            
                                Quick and Simple Hash Code Combinations
                            
                                Algorithm to check similarity of colors
                            
                                Fast prime factorization module
                            
                                Inverting a 4x4 matrix
                            
                                The Most Efficient Way To Find Top K Frequent Words In A Big Word Sequence
                            
                                Easiest algorithm of Voronoi diagram to implement? [closed]
                            
                                How do you like your primary keys? [closed]
                            
                                Find the shortest path in a graph which visits certain nodes
                            
                                Undo/Redo implementation
                            
                                algorithm used to calculate 5 star ratings
                            
                                Searching in a sorted and rotated array

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to implement tag system

Tags:

algorithm

system

tagging

Saif Bechan

People also ask

1 Answers

“MySQLicious” solution

“Scuttle” solution

“Toxi” solution

Nick Dandoulakis

Recent Activity

Donate For Us