Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Help with WordNet data file-format

Tags:

wordnet

I have a question about the WordNet data file-format. The wndb(5) manual page says in part:

The source/target field distinguishes lexical and semantic pointers. It is a four byte field, containing two two-digit hexadecimal integers. The first two digits indicates the word number in the current (source) synset, the last two digits indicate the word number in the target synset. A value of 0000 means that pointer_symbol represents a semantic relation between the current (source) synset and the target synset indicated by synset_offset.

A lexical relation between two words in different synsets is represented by non-zero values in the source and target word numbers. The first and last two bytes of this field indicate the word numbers in the source and target synsets, respectively, between which the relation holds. Word numbers are assigned to the word fields in a synset, from left to right, beginning with 1.

I understand the second paragraph when the source/target numbers are non-zero, but the meaning of when the source/target are "0000" still isn't clear to me.

Let me take an example for the word "aristocrat." The index.noun entry is:

aristocrat n 1 4 @ ~ #m + 1 0 09807754

and the corresponding data.noun entry is:

09807754 18 n 03 aristocrat 0 blue_blood 0 patrician 0 013 @ 09623038 n 0000 #m 08388207 n 0000 + 01590484 a 0306 + 01590484 a 0102 ~ 09840639 n 0000 ~ 09872782 n 0000 ~ 10083823 n 0000 ~ 10175090 n 0000 ~ 10285135 n 0000 ~ 10472799 n 0000 ~ 10474064 n 0000 ~ 10505732 n 0000 ~ 10506642 n 0000 | a member of the aristocracy

the first "ptr" for which is:

@ 09623038 n 0000

and that data.noun entry begins with:

09623038 18 n 01 leader 0 058 @ 00007846 n 0000 ...

What not clear to me are which word(s) this relation are for. Does the hypernym ("@") relation hold for only the original word ("aristrocrat") to all words in the target synset (in this case, there's only "leader")?

Or does the relation hold for all words in the source synset ("aristocrat", "blue blood", and "patrician") to all words in the target synset?

like image 664
Paul J. Lucas Avatar asked Nov 14 '22 05:11

Paul J. Lucas


1 Answers

The relation indeed holds for all words in the source synset to all words in the target synset.

This does not mean that leader is always an hypernym of aristocrat, but it holds true for the considered sense of aristocrat (a member of the aristocracy) and the considered sense of leader (a person who rules or guides or inspires others). Some relations can sound weird, but WordNet isn't perfect and cannot be.

like image 160
Quentin Pradet Avatar answered Dec 31 '22 04:12

Quentin Pradet