Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Database design for tags or tagging

How are tags of an item stored in a database hassle-free?

Each item has multiple tags with it. I have read a few answers on an efficient way to do so :

  1. What is the most efficient way to store tags in a database?
  2. Recommended SQL database design for tags or tagging

But I think there is a better solution to this. Why can't we simply include tags as a long string for each item?

 Table : Brand_Shops
 Columns : brand_id, brand_name, content, tags

Example :

1 || Nike ||  shoes bags sports football soccer t-shirts track-pants
2 ||  GAP || wallets t-shirts jeans shoes perfumes

This does not have atomicity but completely suits the purpose of tagging. If a new brand has to be added, new tags can simply be added along with it. Because of this it will be very easy to fetch it as well. I don't understand why this is not an efficient solution.

like image 411
Shivansh Jagga Avatar asked Jun 11 '18 08:06

Shivansh Jagga


People also ask

What is tagging in database?

In information systems, a tag is a keyword or term assigned to a piece of information (such as an Internet bookmark, multimedia, database record, or computer file). This kind of metadata helps describe an item and allows it to be found again by browsing or searching.

What are the 3 database design steps?

The methodology is depicted as a bit by bit guide to the three main phases of database design, namely: conceptual, logical, and physical design.

What is a design tag?

What are Tags? Tags are a word or phrase that describes your design! Tagging your art with accurate and relevant keywords is one of the most important things you can to help people find your art in both the Threadless Marketplace search and for web searches like Google.


1 Answers

I don't understand why this is not an efficient solution.

It is inefficient because you have to retrieve and break/search that string for every query.

When you do something like (as mentioned in your links) Three tables (one for storing all items, one for all tags, and one for the relation between the two) then you can use the real power of a relational database, the index.

Instead of breaking each string into a tag or set of tags... that's already done; you just get the ones you want. So, if you're searching for "shoes" then it goes straight there (using the index probably log n or faster) and returns both Nike and GAP. It will do this no matter how many tags you have, no matter how many companies you have.

With the 3-table system you do all of the hard work up front and then just do lookups.

If you intend to run this locally or with a limited number of users your solution may be fine. It is also easier to code.
Once your queries start taking more than a few seconds you'll probably want to update your tagging system. If you do it this way, write the search code separately in case you need to rip it out.


Question from comment:

Can you give an example of a 3 table system that is normalized with atomicity

Sure.
You've basically asked for Third Normal Form which is my usual goal. (I admit I often don't make 3NF because I optimize; e.g. storing a postal code with the addres - if you're out of school, that's a better choice)

--Sample SQL stackoverflow.com/questions/50793168/database-design-for-tags-or-tagging/50818392
IF NOT EXISTS (SELECT * FROM sys.schemas WHERE name = N'ChrisC') 
BEGIN
 EXEC sys.sp_executesql N'CREATE SCHEMA [ChrisC] AUTHORIZATION [dbo]'
 IF NOT EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[ChrisC].[Brands]') AND type in (N'U'))
     AND NOT EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[ChrisC].[BrandTags]') AND type in (N'U'))
     AND NOT EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[ChrisC].[Tags]') AND type in (N'U'))
 BEGIN
  CREATE TABLE [ChrisC].[Brands]([pkBrand] [int] IDENTITY(101,1) NOT NULL,[Name] [varchar](40) NULL) ON [PRIMARY]
  INSERT INTO [ChrisC].[Brands]([Name])VALUES('Nike'),('GAP')
  CREATE TABLE [ChrisC].[BrandTags]([pk] [int] IDENTITY(1,1) NOT NULL,[Brand] [int] NULL,[Tag] [int] NULL) ON [PRIMARY]
  INSERT INTO [ChrisC].[BrandTags]([Brand],[Tag])VALUES
    (101,201),(101,202),(101,203),(101,204),(101,205),(101,206),(101,207),
    (102,208),(102,209),(102,203),(102,207),(102,210)
  CREATE TABLE [ChrisC].[Tags]([pkTag] [int] IDENTITY(201,1) NOT NULL,[Tag] [varchar](40) NULL) ON [PRIMARY]
  INSERT INTO [ChrisC].[Tags]([Tag])VALUES
    ('bags'),('football'),('shoes'),('soccer'),('sports'),('track-pants'),('t-shirts'),('jeans'),('perfumes'),('wallets')
  SELECT b.[Name], t.Tag 
  FROM chrisc.Brands b 
  LEFT JOIN chrisc.BrandTags bt ON pkBrand = Brand
  LEFT JOIN chrisc.Tags t ON bt.Tag = t.pkTag
  WHERE b.[Name] = 'Nike'
  -- Stop execution here to see the tables with data
  DROP TABLE [ChrisC].[Brands]
  DROP TABLE [ChrisC].[BrandTags]
  DROP TABLE [ChrisC].[Tags]
 END
 IF  EXISTS (SELECT * FROM sys.schemas WHERE name = N'ChrisC') DROP SCHEMA [ChrisC]
END
like image 99
J. Chris Compton Avatar answered Sep 22 '22 15:09

J. Chris Compton