Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How should I design the tables to store tags in a database?

"question_id": 58640
"tags": ["polls", "fun", "quotes"]
"title": "Great programming quotes"
"question_id": 184618
"tags": ["polls", "fun", "comment"]
"title": "What is the best comment in source code you have ever encountered?"
"question_id": 3734102
"tags": ["c++", "linux", "exit-code"]
"title": "Why cant' I return bigger values from main function ?"
"question_id": 2349378
"tags": ["communication", "terminology", "vocabulary"]
"title": "New programming jargon you coined?"
"question_id": 3723817
"tags": ["open-source", "project-management", "failure", "fail"]
"title": "How to make an open source project fail"
"question_id": 3699150
"tags": ["testing", "interview-questions", "job-interview"]
"title": "Interview question please help"

This is just a text extract some questions that I got using the SO API.

To make this query-able, I want to use SQLite to store the data.

How should I store the tags column?

Since the limit here on SO is five tags, I can use five columns tag1, tag2 ..., but I think there would be something more elegant that can be done. Something that scales to any number of tags being there, and can also handle basic queries like

select title from table where tag has "c++" and "boost" but not "c"
like image 553
Lazer Avatar asked Dec 17 '25 19:12

Lazer


2 Answers

This is a many to many relationship : questions have multiple tags, tags can appear in multiple questions. This means you have to create three tables, one for the questions, one for the tags, and one for the links between these tables. The resulting query would look like this:

SELECT title FROM question
       INNER JOIN question_tag_link USING (question_id)
       INNER JOIN tag USING (tag_id)
            WHERE tag_name IN('c++', 'boost')
              AND NOT EXISTS(
           SELECT * FROM tag t1
            WHERE t1.tag_name = 'c'
              AND t1.question_id = question.question_id);

Not so simple, but I think it is the price to pay if you don't want to be limited. If there are less than 64 different tags, you could use the SET field type, but you would loose very much flexibility (hard to add a new tag).

like image 69
greg0ire Avatar answered Dec 20 '25 00:12

greg0ire


alt text

select distinct a.QuestionTitle
from
(
select q.QuestionID, QuestionTitle, TagName 
from QuestionTags as x
join Question     as q on q.QuestionID = x.QuestionID 
join Tag          as t on t.TagID      = x.TagID 
where TagName in ('c++', 'boost')
) as a
left join
(
select q.QuestionID, QuestionTitle, TagName 
from QuestionTags as x
join Question     as q on q.QuestionID = x.QuestionID 
join Tag          as t on t.TagID      = x.TagID 
where TagName = 'c'
) as b on b.QuestionID = a.QuestionID
where b.QuestionTitle is null
order by a.QuestionTitle ;
like image 42
Damir Sudarevic Avatar answered Dec 19 '25 23:12

Damir Sudarevic



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!