I asked this question on meta, but i now realize that it may be more appropriate for the main site as it is a general question that would relate to any tagging based system (i am happy to close / delete one depending on where people think this question should go)
i have a similar system of tagged data and i am running into the same problem as SOF did where i have lots of tags that are really the same thing. I am trying to create a tag synonym page similar to SOF to support organizing this information.
A few questions around the relationships and "data model" of tag synonyms:
I assume that a master tag can have multiple synonym tags but a synonym tag can only be a synonym for one master tag. Is that correct?
Also, can a master tag also be a synonym tag? For example, lets say you have a tag called javascript and you had:
Master: js
Synonyms: java-script, js-web
can you also have:
Master: javascript
Synonyms: js
So in the example above, you would keep resolving to ultimately resolve js-web to javascript because the master tag: js is itself a synonym tag.
Also, that makes me think you could also run into a circular reference where you have a
Master: js
Synonyms: java-script
and
Master: javascript
Synonyms: js
How does the system deal with circular refernces?
Definition(s): A non-hierarchical keyword or term assigned to a piece of information which helps describe an item and allows it to be found or processed automatically.
In information systems, a tag is a keyword or term assigned to a piece of information (such as an Internet bookmark, multimedia, database record, or computer file). This kind of metadata helps describe an item and allows it to be found again by browsing or searching.
In programming, a tag is an argument to a subroutine that determines other arguments passed to it, which is used as a way to pass indefinite number of tagged parameters to the subroutine; notably, tags are used for a number of system calls in AmigaOS v2. 0 and onwards.
Tags (metadata) – are words or phrases which describe the content of your site, regardless of format. The same tags can be added to text, articles, videos, products or photos. We recommend always using more than one tag for each item.
It is tempting to give you a more theoretical answer on meta concerning folksonomies, polysemy and such! Since I am answering on the StackOverflow side I will try and give a marginally more technical answer. Running queries using the StackOverflow Data Explorer will allow me to attempt to answer your questions (I am not affiliated with StackOverflow so I can't know for sure).
On StackOverflow the master/synonym tag relationship is carefully stewarded and cultivated. At the time of writing from the Data Explorer:
It is interesting to contrast this with other folksonomies, one article "Technorati tags: Good idea, terrible implementation" states.
"Technorati advertises that they're now tracking 466,951 different tags, which is pretty darn impressive when you consider that a typical dictionary has around 75,000 entries"
A quick caveat, I usually write Oracle SQL and I assume that the Data Explorer is using SQLServer so my queries may be a little amateurish. Firstly my presumptions about the data:
Now to your specific queries:
"I assume that a master tag can have multiple synonym tags but a synonym tag can only be a synonym for one master tag. Is that correct?"
select * from TagSynonyms where TargetTagName = 'javascript'
Result: Yes. A master tag can have multiple synonym tags.
select SourceTagName, count(*) from TagSynonyms group by SourceTagName having count(*) > 1
Result: Yes. A synonym tag can only be a synonym for one master tag.
"Also, can a master tag also be a synonym tag?"
select TagName from Tags
intersect
select SourceTagName from TagSynonyms
Result: Yes. A master tag can also be a synonym tag. When I ran this query there were 465 tags that were both synonym and master
"How does the system deal with circular references?"
This is where my logic/SQL may let me down. The question is can I find any circular references? To do this I think I need to work out:
Anything in set c would be a circular reference.
We have already calculated set a above (it has 465 rows).
Set b - synonyms for the synonyms of set a
select SourceTagName from TagSynonyms where TargetTagName in (
select SourceTagName from TagSynonyms where TargetTagName in (
select TagName from Tags
intersect
select SourceTagName from TagSynonyms
))
Result: 0 rows
We can stop here, there is no point working out set c as we already know set b is empty.
Unless I got my logic or SQL wrong (which is very possible) it seems there are no circular references in StackOverflow. I would imagine there are technical processes in place to prevent circular references from happening (otherwise StackOverflow could suffer StackOverflow!).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With