Database Design for Tagging [closed]

People also ask

What is the most efficient way to store tags in a database?

I'd suggest using intermediary third table for storing tags<=>items associations, since we have many-to-many relations between tags and items, i.e. one item can be associated with multiple tags and one tag can be associated with multiple items.

What is tagging in database?

In information systems, a tag is a keyword or term assigned to a piece of information (such as an Internet bookmark, multimedia, database record, or computer file). This kind of metadata helps describe an item and allows it to be found again by browsing or searching.

Here's a good article on tagging Database schemas:

http://howto.philippkeller.com/2005/04/24/Tags-Database-schemas/

along with performance tests:

http://howto.philippkeller.com/2005/06/19/Tagsystems-performance-tests/

Note that the conclusions there are very specific to MySQL, which (at least in 2005 at the time that was written) had very poor full text indexing characteristics.

About ANDing: It sounds like you are looking for the "relational division" operation. This article covers relational division in concise and yet comprehendible way.

About performance: A bitmap-based approach intuitively sounds like it will suit the situation well. However, I'm not convinced it's a good idea to implement bitmap indexing "manually", like digiguru suggests: It sounds like a complicated situation whenever new tags are added(?) But some DBMSes (including Oracle) offer bitmap indexes which may somehow be of use, because a built-in indexing system does away with the potential complexity of index maintenance; additionally, a DBMS offering bitmap indexes should be able to consider them in a proper when when performing the query plan.

I just wanted to highlight that the article that @Jeff Atwood links to (http://howto.philippkeller.com/2005/04/24/Tags-Database-schemas/) is very thorough (It discusses the merits of 3 different schema approaches) and has a good solution for the AND queries that will usually perform better than what has been mentioned here so far (i.e. it doesn't use a correlated subquery for each term). Also lots of good stuff in the comments.

ps - The approach that everyone is talking about here is referred to as the "Toxi" solution in the article.

I don't see a problem with a straightforward solution: Table for items, table for tags, crosstable for "tagging"

Indices on cross table should be enough optimisation. Selecting appropriate items would be

SELECT * FROM items WHERE id IN  
    (SELECT DISTINCT item_id FROM item_tag WHERE  
    tag_id = tag1 OR tag_id = tag2 OR ...)

AND tagging would be

SELECT * FROM items WHERE  
    EXISTS (SELECT 1 FROM item_tag WHERE id = item_id AND tag_id = tag1)  
    AND EXISTS (SELECT 1 FROM item_tag WHERE id = item_id AND tag_id = tag2)  
    AND ...

which is admittedly, not so efficient for large number of comparing tags. If you are to maintain tag count in memory, you could make query to start with tags that are not often, so AND sequence would be evaluated quicker. Depending on expected number of tags to be matched against and expectancy of matching any single of them this could be OK solution, if you are to match 20 tags, and expect that some random item will match 15 of them, then this would still be heavy on a database.

Related questions
                            
                                Removing duplicate rows from table in Oracle
                            
                                Sql Server equivalent of a COUNTIF aggregate function
                            
                                Fastest way to determine if record exists
                            
                                How to retrieve the current value of an oracle sequence without increment it?
                            
                                How do you run a single query through mysql from the command line?
                            
                                How to insert multiple rows from a single query using eloquent/fluent
                            
                                What is this operator <=> in MySQL?
                            
                                How big is too big for a PostgreSQL table?
                            
                                How to select the first row of each group?
                            
                                sqlite alter table add MULTIPLE columns in a single statement
                            
                                Failed to enable constraints. One or more rows contain values violating non-null, unique, or foreign-key constraints
                            
                                Oracle "(+)" Operator
                            
                                Difference between BYTE and CHAR in column datatypes
                            
                                How do you run a SQL Server query from PowerShell?
                            
                                How can I get the SQL of a PreparedStatement?
                            
                                SQL UPDATE all values in a field with appended string CONCAT not working
                            
                                How to do a case sensitive search in WHERE clause (I'm using SQL Server)?
                            
                                Better techniques for trimming leading zeros in SQL Server?
                            
                                Select SQL Server database size
                            
                                SQL Server 2005 How Create a Unique Constraint?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Database Design for Tagging [closed]

Tags:

sql

database-design

tags

tagging

People also ask

Recent Activity

Donate For Us