Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the data model used for tags and tag synonyms?

Tags:

tags

synonym

I asked this question on meta, but i now realize that it may be more appropriate for the main site as it is a general question that would relate to any tagging based system (i am happy to close / delete one depending on where people think this question should go)


i have a similar system of tagged data and i am running into the same problem as SOF did where i have lots of tags that are really the same thing. I am trying to create a tag synonym page similar to SOF to support organizing this information.

A few questions around the relationships and "data model" of tag synonyms:

I assume that a master tag can have multiple synonym tags but a synonym tag can only be a synonym for one master tag. Is that correct?

Also, can a master tag also be a synonym tag? For example, lets say you have a tag called javascript and you had:

Master: js
Synonyms: java-script, js-web

can you also have:

Master: javascript
Synonyms: js

So in the example above, you would keep resolving to ultimately resolve js-web to javascript because the master tag: js is itself a synonym tag.

Also, that makes me think you could also run into a circular reference where you have a

Master: js
Synonyms: java-script

and

Master: javascript
Synonyms: js

How does the system deal with circular refernces?

like image 825
leora Avatar asked Aug 30 '11 14:08

leora


People also ask

What are data tags used for?

Definition(s): A non-hierarchical keyword or term assigned to a piece of information which helps describe an item and allows it to be found or processed automatically.

What are tags in database?

In information systems, a tag is a keyword or term assigned to a piece of information (such as an Internet bookmark, multimedia, database record, or computer file). This kind of metadata helps describe an item and allows it to be found again by browsing or searching.

What are tags in programming?

In programming, a tag is an argument to a subroutine that determines other arguments passed to it, which is used as a way to pass indefinite number of tagged parameters to the subroutine; notably, tags are used for a number of system calls in AmigaOS v2. 0 and onwards.

What are tags on a website?

Tags (metadata) – are words or phrases which describe the content of your site, regardless of format. The same tags can be added to text, articles, videos, products or photos. We recommend always using more than one tag for each item.


1 Answers

It is tempting to give you a more theoretical answer on meta concerning folksonomies, polysemy and such! Since I am answering on the StackOverflow side I will try and give a marginally more technical answer. Running queries using the StackOverflow Data Explorer will allow me to attempt to answer your questions (I am not affiliated with StackOverflow so I can't know for sure).

On StackOverflow the master/synonym tag relationship is carefully stewarded and cultivated. At the time of writing from the Data Explorer:

  • Tags has 29488 rows
  • TagSynonyms has 1916 rows

It is interesting to contrast this with other folksonomies, one article "Technorati tags: Good idea, terrible implementation" states.

"Technorati advertises that they're now tracking 466,951 different tags, which is pretty darn impressive when you consider that a typical dictionary has around 75,000 entries"

A quick caveat, I usually write Oracle SQL and I assume that the Data Explorer is using SQLServer so my queries may be a little amateurish. Firstly my presumptions about the data:

  • anything listed in the Tags table is a "master tag".
  • in the TagSynonyms table, TargetTagName is a "master tag", SourceTagName is the "synonym tag".

Now to your specific queries:

"I assume that a master tag can have multiple synonym tags but a synonym tag can only be a synonym for one master tag. Is that correct?"

select * from TagSynonyms where TargetTagName = 'javascript'

Result: Yes. A master tag can have multiple synonym tags.

select SourceTagName, count(*) from TagSynonyms group by SourceTagName having count(*) > 1

Result: Yes. A synonym tag can only be a synonym for one master tag.

"Also, can a master tag also be a synonym tag?"

select TagName from Tags
intersect
select SourceTagName from TagSynonyms

Result: Yes. A master tag can also be a synonym tag. When I ran this query there were 465 tags that were both synonym and master

"How does the system deal with circular references?"

This is where my logic/SQL may let me down. The question is can I find any circular references? To do this I think I need to work out:

  • Set a - set of tags that are both master and synonym
  • Set b - synonyms for the synonyms of the tags in set a
  • Set c - a intersection b

Anything in set c would be a circular reference.

We have already calculated set a above (it has 465 rows).

Set b - synonyms for the synonyms of set a

select SourceTagName from TagSynonyms where TargetTagName in (
select SourceTagName from TagSynonyms where TargetTagName in (
select TagName from Tags
intersect
select SourceTagName from TagSynonyms
))

Result: 0 rows

We can stop here, there is no point working out set c as we already know set b is empty.

Unless I got my logic or SQL wrong (which is very possible) it seems there are no circular references in StackOverflow. I would imagine there are technical processes in place to prevent circular references from happening (otherwise StackOverflow could suffer StackOverflow!).

like image 160
Mark McLaren Avatar answered Jan 19 '23 06:01

Mark McLaren