Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's the difference between WordNet 3.1 and WordNet 3.0?

Tags:

nlp

wordnet

There doesn't seem to be a changelog or something of that sort available at wordnet.princeton.edu

like image 395
piggs_boson Avatar asked Sep 06 '15 12:09

piggs_boson


1 Answers

To add to @abarisone's answer, the actual synset IDs themselves can differ between WordNet 3.0 and WordNet 3.1 :(

For example, in WordNet 3.1 a chair is 103005231-n.

However, in WordNet 3.0 it was 103001627-n. But you cannot look that up in http://wordnet-rdf.princeton.edu/wn31/103001627-n nor http://wordnet-rdf.princeton.edu/wn30/103001627-n, but instead you need to use http://wordnet-rdf.princeton.edu/wn30/03001627-n which incorrectly redirects to 102992974-n.

I think it's a bug in WordNet RDF 3.1 online app, because 102992974-n doesn't officially exist. You can't even search for it (both online and offline). And if you get the RDF/JSON-LD file on that page, it gives you 103005231-n.

In wn3.1.dict/dict/index.noun :

chair n 5 4 @ ~ %p + 5 2 03005231 00599171 10488547 03275941 03005700  

There's no mention of 02992974 anywhere in that file.

Both of these issues are confusing. I wonder why they changed synset IDs in minor revision.


Regarding status of WordNet synset IDs:

Conclusion is, currently, using WordNet 3.0 synset IDs is safest.

For future work, can consider using Inter-Lingual Index from Global Wordnet Association (coming soon). Which will have IDs compatible with Wordnet 3.0.

References from wn-users mailing list, 30 Oct 2015:

From: Raphael, Nicholas

The URI is built from the “dblocation” field, which is a byte offset from the beginning of the relevant character-based database file (I’m not sure which). This will change from release to release as items are removed and added and moved around.

.

From: Peter Clark

To the best of my knowledge…. FYI a little known fact is that the sense keys (e.g., “ability%1:07:00::”) are stable between releases, except when senses are split or merged. This provides a stable way to refer to synsets across releases, rather than use synset numbers. Also you can find the mappings between synset numbers in different releases by looking for the same sense keys. (sensekey->synset is a many-to-1 mapping: A synset may have multiple sense keys, one for each word+sense in the synset. But a sense key maps to exactly one synset). Best wishes, Pete

.

From: John McCrae

Hello Hendy,

Yes WordNet synset Identifiers are based on the byte offset of the descriptor in a given release of WordNet, as such they are far from stable across versions of WordNets. The sense identifiers are more stable but still can be unreliable as sense do get split and merged. Also, there are two slightly different versions of WordNet 3.1 and the WordNet RDF version accepts synset identifiers from either... this is of course, as others have commented, all very confusing.

For this reason, the Global WordNet Association has started work on an Inter-Lingual Index, which we expect to be online soon (i.e., in time for the Global WordNet Conference in January), and will give each synset a single unchanging URI.

Piek Vossen gave a good talk about this recently and this slides are online here: http://ldl2014.org/slides/Vossen-LOD-CILI.pdf

For the moment, I would recommend using WN 3.0 identifiers to link synsets, which the WordNet Interlingual Index will also be based on.

Regards, John

like image 132
Hendy Irawan Avatar answered Oct 03 '22 20:10

Hendy Irawan