Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

GUID vs Identity for use with tags

I know this subject has been discussed at length (I have read lots of posts here and all over the web) and I don't like to flog a dead horse but I have a question about a more specific aspect of Integer ID vs GUID:

I am writing out a schema which is going to have a tagging ability similar in use to the stackoverflow tags but it will have the same tags used against 5+ different tables.

The basic tables I will be linking are as follows:

Tag Table

Tag ID    Tag Name      Tag Description
-------------------------------------------------------------
     1    Hats          Tag for hats
     2    Coats         Tag for coats
     3    Gloves        Gloves tag
     4    Ladies        Ladies item

Items Table 1

Item ID    Item Name    Cost
------------------------------------------------------------
      1    Deerstalker  £20.00
      2    Fedora       £50.00
      3    Scarf        £15.00

The bit I'm having trouble with is the tag_item table.

I will have 5 tables with completely different structures that I want the users to be able to apply tags to so I think I'll need to do one of the following:

  • Store table name/table number as well as the integer key of the row the tag relates to
  • Store a GUID of the row, this will work independent of the table and make it much easier to get all tags for a certain row.

What I'm unsure of is how this will affect performance for:

  • Searching for all items over 5 tables with a certain tag/tags
  • editing the tags for an item
  • joins

Is there any clearly better option in this case or anywhere I could read up on the advantages in this particular scenario?


Edit:

For anyone interested, below is the schema I have decided upon:

Table for Tags

CREATE TABLE [dbo].[Sys_TagList](

    [Sys_Tag_Primary] [int] IDENTITY(1,1) NOT NULL,
    [Sys_Tag_Name] [varchar](50) NOT NULL,
    [Sys_Tag_Description] [varchar](1000) NULL

)

Table for Tag_Items

CREATE TABLE [dbo].[Meta_Tags](

    [Met_Tag_Primary] [int] IDENTITY(1,1) NOT NULL,
    [Met_Tag_Link_FK] [int] NOT NULL, -- Link to [Sys_Tag_Primary] field
    [Met_Tag_Name] [varchar](25) NOT NULL, -- To enable tag searches to use a single table
    [Met_Tag_Table] [varchar](25) NOT NULL, -- Table name, used with [Met_Tag_RowID] to create link to that item
    [Met_Tag_RowID] [int] NOT NULL -- Primary key of item[n] table

)
like image 670
bendataclear Avatar asked Oct 15 '12 16:10

bendataclear


2 Answers

I'm going to (perhaps unhelpfully) suggest that GUID vs ID is the wrong question to be focusing on. The assumption that you need a single table linking tags to items may be worth examining first.

If your five item categories are different enough to require different tables then you should probably consider five tag_item link tables - one for each item table. Although the code to query them all at once (using UNION ALL to concatenate the results) may be a bit more verbose and repetetive than the code for a single link table, the queries are likely to be substantially simpler - good from both a maintenance and a query-plan perspective.

The other alternative would be to refactor the common attributes that the five item categories share into a single central table (to which the tags are linked) and store the additional non-common attributes in five child tables. It's difficult to know whether this is appropriate without some more information about how the existing data is currently used.

like image 98
Ed Harper Avatar answered Nov 09 '22 04:11

Ed Harper


Obligatory possible tagging design duplicate

Depending on how you intend to use the tags and what performance characteristics are important, one possible alternative to the map/link table is a hashed combination table.

Calculate the hash of a given requested combination of tags and store that directly in your existing tables, and in the combination table.

Querying for anything that is a subset of tags might start to get kind of hacky though as you'll end up doing a lot of in memory transforms of all those hashes. Perhaps there is a cleaner way of querying hashed reference values out there, as I cant recall doing this, and didn't immediately locate anything in a search.

like image 28
JustinC Avatar answered Nov 09 '22 06:11

JustinC