I am new to both python and nltk. I have converted the code from https://gist.github.com/alexbowe/879414 to the below given code to make it run for many documents/text chunks. But I got the following error
Traceback (most recent call last):
File "E:/NLP/PythonProgrames/NPExtractor/AdvanceMain.py", line 16, in <module>
result = np_extractor.extract()
File "E:\NLP\PythonProgrames\NPExtractor\NPExtractorAdvanced.py", line 67, in extract
for term in terms:
File "E:\NLP\PythonProgrames\NPExtractor\NPExtractorAdvanced.py", line 60, in get_terms
for leaf in self.leaves(tree):
TypeError: leaves() takes 1 positional argument but 2 were given
Can any one help me to fix this problem. I have to extract noun phrases from millions of product reviews. I used Standford NLP kit using Java, but it was extremely slow, so I thought using nltk in python will be better. Please also recommend if there is any better solution.
import nltk
from nltk.corpus import stopwords
stopwords = stopwords.words('english')
grammar = r"""
NBAR:
{<NN.*|JJ>*<NN.*>} # Nouns and Adjectives, terminated with Nouns
NP:
{<NBAR>}
{<NBAR><IN><NBAR>} # Above, connected with in/of/etc...
"""
lemmatizer = nltk.WordNetLemmatizer()
stemmer = nltk.stem.porter.PorterStemmer()
class NounPhraseExtractor(object):
def __init__(self, sentence):
self.sentence = sentence
def execute(self):
# Taken from Su Nam Kim Paper...
chunker = nltk.RegexpParser(grammar)
#toks = nltk.regexp_tokenize(text, sentence_re)
# #postoks = nltk.tag.pos_tag(toks)
toks = nltk.word_tokenize(self.sentence)
postoks = nltk.tag.pos_tag(toks)
tree = chunker.parse(postoks)
return tree
def leaves(tree):
"""Finds NP (nounphrase) leaf nodes of a chunk tree."""
for subtree in tree.subtrees(filter=lambda t: t.label() == 'NP'):
yield subtree.leaves()
def normalise(word):
"""Normalises words to lowercase and stems and lemmatizes it."""
word = word.lower()
word = stemmer.stem_word(word)
word = lemmatizer.lemmatize(word)
return word
def acceptable_word(word):
"""Checks conditions for acceptable word: length, stopword."""
accepted = bool(2 <= len(word) <= 40
and word.lower() not in stopwords)
return accepted
def get_terms(self,tree):
for leaf in self.leaves(tree):
term = [self.normalise(w) for w, t in leaf if self.acceptable_word(w)]
yield term
def extract(self):
terms = self.get_terms(self.execute())
matches = []
for term in terms:
for word in term:
matches.append(word)
return matches
You need to either:
normalize
, acceptable_word
, and leaves
with @staticmethod, orself
parameter as the first parameter of these methods.You're calling self.leaves
which will pass self
as an implicit first parameter to the leaves
method (but your method only takes a single parameter). Making these static methods, or adding a self
parameter will fix this issue.
(your later calls to self.acceptable_word
,and self.normalize
will have the same issue)
You could read about Python's static methods in their docs, or possibly from an external site that may be easier to digest.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With