Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What would cause WordNetCorpusReader to have no attribute LazyCorpusLoader?

I've got a short function to check whether a word is a real word by comparing it to the WordNet corpus from the Natural Language Toolkit. I'm calling this function from a thread that validates txt files. When I run my code, the first time the function is called it throws a AttributeError with the message

"'WordNetCorpusReader' object has no attribute '_LazyCorpusLoader__args'"

When I pause execution, the same line of code does not throw an error, so I assume that the corpus is not yet loaded on my first call causing the error.

I have tried using nltk.wordnet.ensure_loaded() to force load the corpus, but I'm still getting the same error.

Here's my function:

from nltk.corpus import wordnet as wn
from nltk.corpus import stopwords
from nltk.corpus.reader.wordnet import WordNetError
import sys

cachedStopWords = stopwords.words("english")

def is_good_word(word):
    word = word.strip()
    if len(word) <= 2:
        return 0
    if word in cachedStopWords:
        return 0
    try:
        wn.ensure_loaded()
        if len(wn.lemmas(str(word), lang='en')) == 0:
            return 0
    except WordNetError as e:
        print "WordNetError on concept {}".format(word)
    except AttributeError as e:
        print "Attribute error on concept {}: {}".format(word, e.message)
    except:
        print "Unexpected error on concept {}: {}".format(word, sys.exc_info()[0])
    else:
        return 1
    return 1

print (is_good_word('dog')) #Does NOT throw error

If I have a print statement in the same file at the global scope, it does not throw the error. However, if I call it from my thread, it does. The following is a minimal example to reproduce the error. I've tested it and on my machine it gives the output

Attribute error on concept dog: 'WordNetCorpusReader' object has no attribute '_LazyCorpusLoader__args'
Attribute error on concept dog: 'WordNetCorpusReader' object has no attribute '_LazyCorpusLoader__args'
Attribute error on concept dog: 'WordNetCorpusReader' object has no attribute '_LazyCorpusLoader__args'
Attribute error on concept dog: 'WordNetCorpusReader' object has no attribute '_LazyCorpusLoader__args'
Attribute error on concept dog: 'WordNetCorpusReader' object has no attribute '_LazyCorpusLoader__args'
Attribute error on concept dog: 'WordNetCorpusReader' object has no attribute '_LazyCorpusLoader__args'
Attribute error on concept dog: 'WordNetCorpusReader' object has no attribute '_LazyCorpusLoader__args'
Attribute error on concept dog: 'WordNetCorpusReader' object has no attribute '_LazyCorpusLoader__args'
Attribute error on concept dog: 'WordNetCorpusReader' object has no attribute '_LazyCorpusLoader__args'

Minimal example:

import time
import threading
from filter_tag import is_good_word

class ProcessMetaThread(threading.Thread):
    def __init__(self):
        threading.Thread.__init__(self)

    def run(self):
        is_good_word('dog') #Throws error


def process_meta(numberOfThreads):

    threadsList = []
    for i in range(numberOfThreads):
        t = ProcessMetaThread()
        t.setDaemon(True)
        t.start()
        threadsList.append(t)

    numComplete = 0
    while numComplete < numberOfThreads:
        # Iterate over the active processes
        for processNum in range(0, numberOfThreads):
            # If a process actually exists
            if threadsList != None:
                # If the process is finished
                if not threadsList[processNum] == None:
                    if not threadsList[processNum].is_alive():
                        numComplete += 1
                        threadsList[processNum] = None
        time.sleep(5)

    print 'Processes Finished'


if __name__ == '__main__':
    process_meta(10)
like image 333
Cecilia Avatar asked Dec 11 '14 22:12

Cecilia


1 Answers

I have run your code and get the same error. For a working solution, see below. Here is the explanation:

LazyCorpusLoader is a proxy object that stands in for a corpus object before the corpus is loaded. (This prevents the NLTK from loading massive corpora into memory before you need them.) The first time this proxy object is accessed, however, it becomes the corpus you intend to load. That is to say, the LazyCorpusLoader proxy object transforms its __dict__ and __class__ into the __dict__ and __class__ of the corpus you are loading.

If you compare your code to your errors above, you can see that you received 9 errors when you tried to create 10 instances of your class. The first transformation of the LazyCorpusLoader proxy object into a WordNetCorpusReader object was successful. This action was triggered when you accessed wordnet for the first time:

The First Thread

from nltk.corpus import wordnet as wn
def is_good_word(word):
    ...
    wn.ensure_loaded()  # `LazyCorpusLoader` conversion into `WordNetCorpusReader` starts

The Second Thread

When you begin to run your is_good_word function in a second thread, however, your first thread has not completely transformed the LazyCorpusLoader proxy object into a WordNetCorpusReader. wn is still a LazyCorpusLoader proxy object, so it begins the __load process again. Once it gets to the point where it tries to convert its __class__ and __dict__ into a WordNetCorpusReader object, however, the first thread has already converted the LazyCorpusLoader proxy object into a WordNetCorpusReader. My guess is that you are running into an error in the line with my comment below:

class LazyCorpusLoader(object):
    ...
    def __load(self):
        ...
        corpus = self.__reader_cls(root, *self.__args, **self.__kwargs)  # load corpus
        ...
        # self.__args == self._LazyCorpusLoader__args
        args, kwargs  = self.__args, self.__kwargs                       # most likely the line throwing the error

Once the first thread has transformed the LazyCorpusLoader proxy object into a WordNetCorpusReader object, the mangled names will no longer work. The WordNetCorpusReader object will not have LazyCorpusLoader anywhere in its mangled names. (self.__args is equivalent to self._LazyCorpusLoader__args while the object is a LazyCorpusLoader object.) Thus you get the following error:

AttributeError: 'WordNetCorpusReader' object has no attribute '_LazyCorpusLoader__args'

An Alternative

In light of this issue, you will want to access the wn object before you enter into your threading. Here is your code modified appropriately:

from nltk.corpus import wordnet as wn
from nltk.corpus import stopwords
from nltk.corpus.reader.wordnet import WordNetError
import sys
import time
import threading

cachedStopWords = stopwords.words("english")


def is_good_word(word):
    word = word.strip()
    if len(word) <= 2:
        return 0
    if word in cachedStopWords:
        return 0
    try:
        if len(wn.lemmas(str(word), lang='en')) == 0:     # no longer the first access of wn
            return 0
    except WordNetError as e:
        print("WordNetError on concept {}".format(word))
    except AttributeError as e:
        print("Attribute error on concept {}: {}".format(word, e.message))
    except:
        print("Unexpected error on concept {}: {}".format(word, sys.exc_info()[0]))
    else:
        return 1
    return 1


class ProcessMetaThread(threading.Thread):
    def __init__(self):
        threading.Thread.__init__(self)

    def run(self):
        is_good_word('dog')


def process_meta(numberOfThreads):
    print wn.__class__            # <class 'nltk.corpus.util.LazyCorpusLoader'>
    wn.ensure_loaded()            # first access to wn transforms it
    print wn.__class__            # <class 'nltk.corpus.reader.wordnet.WordNetCorpusReader'>
    threadsList = []
    for i in range(numberOfThreads):
        start = time.clock()
        t = ProcessMetaThread()
        print time.clock() - start
        t.setDaemon(True)
        t.start()
        threadsList.append(t)

    numComplete = 0
    while numComplete < numberOfThreads:
        # Iterate over the active processes
        for processNum in range(0, numberOfThreads):
            # If a process actually exists
            if threadsList != None:
                # If the process is finished
                if not threadsList[processNum] == None:
                    if not threadsList[processNum].is_alive():
                        numComplete += 1
                        threadsList[processNum] = None
        time.sleep(5)

    print('Processes Finished')


if __name__ == '__main__':
    process_meta(10)

I have tested the above code and received no errors.

like image 85
Justin O Barber Avatar answered Oct 18 '22 21:10

Justin O Barber