Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Tagging system: Toxi solution questions

I'm sort of breaking my head over the Toxi solution for tag database schemas. I'm working on a system to which users can submit items, and those items can have tags associated with them. After reading up on tagschemas, I found the Toxi solution to suit my needs most. However, I'm not entirely sure if I'm planning this right, so I'd like your opinions on this please.

I'll have three databases.
items containing item_id and others
tagmap using item_id and tag_id as foreign keys
tags containing tag_id and tag_text

When adding a new item, am I right to assume the process to add the tags to the database is as follows?

  1. sort submitted tags into array
  2. for every tag in the array:
    1. get tag_id from tags where tag_text matches the current tag
    2. if that returns 0 rows:
      1. add tag to tags table
      2. get the tag_id
    3. add item_id and tag_id to tagmap
  3. finish (give user the a-okay, etc)

This means we'll end up with an entry in the tagmap for every tag for every item. It seems correct, but I can't help but think there's a better way to do that than ending up with a huge amount of entries there...

As for editing the tags, I've thought up the following process, though I think there's a better way which I haven't found yet.

  1. get tags using item_id and insert into editable field
  2. user makes changes. on submit:
  3. delete rows from tagmap where item_id matches the one being edited
  4. same process as the one listed above

I'm kind of iffy about point 3 there. Is there a way for me to check if any tags have been removed, so I can selectively delete tags instead of just deleting and re-adding them? And just to be sure: when deleting tagmap rows, the related items won't be deleted with it because it points to a foreign key instead of acting as one, right?

Also, I may want to keep track of the amount of times a tag is used, but I don't want to run a query to count those every time it needs to be displayed. I'm thinking of having a cron job count the number of instances for every tag_id in tagmap once every hour, or bihourly, and then update the tag_use value in the tags table. Is that a correct way to do it, or is there a better way?

Looking back, that's quite the hefty amount of text. Welp, rather too detailed than missing information, and rather asking too many questions and learning a lot of new things than asking too few. Good chance I've just spent too much time looking into this today, and it'll all make more sense tomorrow.

Thanks in advance!

like image 685
Fang Avatar asked Dec 07 '22 14:12

Fang


1 Answers

First of all "toxi" is not a standard term. Always define your terms! Or at least provide relevant links.

And now to the question itself...

I'll have three databases.

No, you'll have 3 tables.

When adding a new item...

You are pretty much on the right track, with the exception that you can use the set-based nature of SQL to "merge" many of these steps. For example, tagging an item 1 with tags: 'tag1', 'tag2' and 'tag3' can be done like this...

INSERT IGNORE INTO tagmap (item_id, tag_id)
SELECT 1, tag_id FROM tags WHERE tag_text IN ('tag1', 'tag2', 'tag3');

The IGNORE allows this to succeed even if item is already connected to some of these tags.

This assumes all required tags are already in tags. Assuming tag.tag_id is auto-increment, you can do something like this to ensure they are:

INSERT IGNORE INTO tags (tag_text) VALUES ('tag1'), ('tag2'), ('tag3');

This means we'll end up with an entry in the tagmap for every tag for every item. It seems correct, but I can't help but think there's a better way to do that then ending up with a huge amount of entries there...

There is no magic. If "item is connected to a particular tag" is piece of knowledge you want to record, then it will have to have some sort of physical representation in the database.

As for editing the tags...

You mean re-tagging items (not modifying tags themselves)?

To remove all tags that are not in the list, do something like this:

DELETE FROM tagmap
WHERE
    item_id = 1
    AND tag_id NOT IN (
        SELECT tag_id FROM tags
        WHERE tag_text IN ('tag1', 'tag3')
    );

This will disconnect the item from all tags except 'tag1' and 'tag3'. Execute the INSERT above and this DELETE one after another to "cover" both adding and removing tags.

You can play with all this in the SQL Fiddle.

And just to be sure: when deleting tagmap rows, the related items won't be deleted with it because it points to a foreign key instead of acting as one, right?

Correct. A child endpoint of a FK will not trigger a referential action (such as ON DELETE CASCADE), only parent will.

BTW, you are using this schema because you want additional fields in tags (beside tag_text), right? If you do, not loosing this additional data just because all connections are gone is desired behavior.

But if you just wanted the tag_text, you'd use a simpler schema where deleting all connections would be the same as deleting the tag itself:

enter image description here

This would not just simplify the SQL, it would also provide better clustering.

At first glance, "toxi" might look like it's saving space, but this might actually not be the case in practice, since it requires additional tables and indexes (and tags tend to be short).

Also, I may want to keep track of the amount of times a tag ... cron job ...

Measure before you decide to do something like this. My SQL Fiddle mentioned above uses a very deliberate order of fields in the tagmap PK, so data is clustered in a way very friendly to this kind of counting (remember: InnoDB tables are clustered). You'd have to have a truly huge amount of items (or require unusually high performance) before this becomes a problem.

In any case, measure on realistic amounts of data!

like image 74
Branko Dimitrijevic Avatar answered Dec 10 '22 10:12

Branko Dimitrijevic