How to use the language option in synsets (nltk) if you load a wordnet manually?

Tags:

For specific purposes I have to use the Wordnet 1.6 instead of the current version implemented in the nltk package. I then downloaded the old version here and tried to run a simple extract of code using the french option.

from collections import defaultdict
import nltk
#nltk.download() 
import os
import sys
from nltk.corpus import WordNetCorpusReader

cwd = os.getcwd()
nltk.data.path.append(cwd)
wordnet16_dir="wordnet-1.6/"
wn16_path = "{0}/dict".format(wordnet16_dir)
wn = WordNetCorpusReader(os.path.abspath("{0}/{1}".format(cwd, wn16_path)), nltk.data.find(wn16_path))

senses=wn.synsets('gouvernement',lang=u'fre')

It seems that the wordnet I manually downloaded cannot be linked to the files of the nltk module dealing with foreign languages, the error I get is the following :

Traceback (most recent call last):
File "C:/Users/Stephanie/Test/temp.py", line 16, in <module>
senses=wn.synsets('gouvernement',lang=u'fre')
File "C:\Python27\lib\site-packages\nltk\corpus\reader\wordnet.py", line 1419, in synsets
self._load_lang_data(lang)
File "C:\Python27\lib\site-packages\nltk\corpus\reader\wordnet.py", line 1064, in _load_lang_data
if lang not in self.langs():
File "C:\Python27\lib\site-packages\nltk\corpus\reader\wordnet.py", line 1088, in langs
fileids = self._omw_reader.fileids()
AttributeError: 'FileSystemPathPointer' object has no attribute 'fileids'

Using an english word doesn't generate any error (so it's not that I did not load the dictionary well) :

senses=wn.synsets('government')
print senses

[Synset('government.n.01'), Synset('government.n.02'), Synset('government.n.03'), Synset('politics.n.02')]

If I use the current version of Wordnet loaded with the nltk module I don't have any problem using french (so it's not a syntax problem with the optional argument)

from nltk.corpus import wordnet as wn
senses=wn.synsets('gouvernement',lang=u'fre')
print senses
[Synset('government.n.02'), Synset('opinion.n.05'), Synset('government.n.03'), Synset('rule.n.01'), Synset('politics.n.02'), Synset('government.n.01'), Synset('regulation.n.03'), Synset('reign.n.03')]

But, as precised, I really have to use the old version. I guess this might be a path problem. I've been trying to read the code of the WordNetCorpusReader function but I am quite new with python I don't really see what the problem is so far, except that it doesn't find a special file.

The needed file seems to be wn-data-fre.tab which is located in \nltk_data\corpora\omw\fre. I am pretty sure that I have to change the file with a version compatible with wordnet 1.6 but still, why the function WordNetCorpusReader can't find it ?

239

asked Jul 17 '15 14:07

Stéphanie C

1 Answers

Short Answer:

There is no WordNet 1.6 with the language parameter. There's no way to use lang='fre' when loading a different WordNet through NLTK.

Long Answer:

The lang=... parameter is an addition made using the Open Multilingual WordNet (OMW: http://compling.hss.ntu.edu.sg/omw/) that links wordnet of different languages to the Princeton WordNet version 3.0. See https://github.com/nltk/nltk/blob/develop/nltk/corpus/reader/wordnet.py#L1050.

The lang=... parameter calls the function:

def langs(self):
    ''' return a list of languages supported by Multilingual Wordnet '''
    import os
    langs = []
    fileids = self._omw_reader.fileids()
    for fileid in fileids:
        file_name, file_extension = os.path.splitext(fileid)
        if file_extension == '.tab':
            langs.append(file_name.split('-')[-1])

    return langs

That looks for the file, see https://github.com/nltk/nltk/blob/develop/nltk/corpus/reader/wordnet.py#L1070:

 f = self._omw_reader.open('{0:}/wn-data-{0:}.tab'.format(lang))

So if lang == 'fre', then self._omw_reader = wn-data-fre.tab.

And the main reason why the omw can't find the wn-data-fre.tab in nltk_data/corpora/omw/ because you've set the omw_reader to wn16_path when initializing the WordNetCorpusReader object, see https://github.com/nltk/nltk/blob/develop/nltk/corpus/reader/wordnet.py#L1006.

Then when loading the french data, it can't find self._omw_reader.open('{0:}/wn-data-{0:}.tab'.format(lang)). (see https://github.com/nltk/nltk/blob/develop/nltk/corpus/reader/wordnet.py#L1419 and https://github.com/nltk/nltk/blob/develop/nltk/corpus/reader/wordnet.py#L1070)

What you can try to do is this load 2 instances of WordNet:

import os
from nltk.corpus import wordnet as wn
from nltk.corpus import WordNetCorpusReader

cwd = os.getcwd()
nltk.data.path.append(cwd)
wordnet16_dir="wordnet-1.6/"

wn16_path = "{0}/dict".format(wordnet16_dir)
wn16 = WordNetCorpusReader(os.path.abspath("{0}/{1}".format(cwd, wn16_path)), nltk.data.find(wn16_path))

def synset2offset(ss):
    return str(ss.offset()).zfill(8) + '-' + ss.pos()


wn16_ids = [synset2offset(ss) for ss in wn16.all_synsets()]
wn30_ids = [synset2offset(ss) for ss in wn.all_synsets()]


senses30 = wn.synsets('gouvernement',lang=u'fre')
senses16 = [ss for ss in wn.synsets('gouvernement',lang=u'fre') if synset2offset(ss) in wn16_ids]

193

answered Sep 23 '22 19:09

alvas

Related questions
                            
                                Loading bigger than memory hdf5 file in pyspark
                            
                                Python, strptime is skipping zeros in the millisecond section
                            
                                pandas 'as_index' function doesn't work as expected
                            
                                plot a bar chart using matplotlib - type error
                            
                                how to use atexit when exception is raised
                            
                                How to print stuff in a py.test finalizer
                            
                                Conditional passing of arguments to methods in python
                            
                                Is there a way to use super() to call the __init__ method of each base class in Python?
                            
                                Unexpected output from mpi4py program
                            
                                Advantages of Dict over OrderedDict [duplicate]
                            
                                Is there a name for double underscore functions? [duplicate]
                            
                                Python - Parse string, known structure
                            
                                how can we wire up cluster based softwares using chef?
                            
                                Upgrading Django 1.5 to 1.8 the wild way. Good idea or a pretty silly one?
                            
                                Float division of big numbers in python
                            
                                Replace elements in numpy array using list of old and new values
                            
                                Python numpy: sum every 3 rows (converting monthly to quarterly)
                            
                                How do I link records to a large table efficiently using python Dedupe?
                            
                                Making Python scripts run on Windows without specifying ".py" extension
                            
                                Why isn't isnumeric working?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to use the language option in synsets (nltk) if you load a wordnet manually?

Tags:

python

path

nlp

nltk

wordnet

Stéphanie C

People also ask

1 Answers

alvas

Recent Activity

Donate For Us