Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to generate multiple parse trees for an ambiguous sentence in NLTK?

I have the following code in Python.

sent = [("very","ADJ"),("colourful","ADJ"),("ice","NN"),("cream","NN"),("van","NN")] 
patterns= r"""
  NP:{<ADJ>*<NN>+}  

"""
NPChunker=nltk.RegexpParser(patterns) # create chunk parser
for s in NPChunker.nbest_parse(sent):
    print s.draw()

The output is:

(S (NP very/ADJ colourful/ADJ ice/NN cream/NN van/NN))

But the output should have another 2 parse trees.

(S (NP very/ADJ colourful/ADJ ice/NN) (NP cream/NN) (NP van/NN))
(S (NP very/ADJ colourful/ADJ ice/NN cream/NN) van/NN)

The problem is that only the first regular expression is taken by the RegexpParser. How can I generate all possible parse trees at once?

like image 886
gamma Avatar asked Sep 27 '13 18:09

gamma


1 Answers

This is not possible with the RegexpParser class. It inherits the nbest_parse method from the ParserI interface, and looking at the source code (https://github.com/nltk/nltk/blob/master/nltk/parse/api.py) it can be seen that it just defaults to running the parse method of the base class and returning that as an iterable.

As someone tried to explain in Chunking with nltk, the chunking classes are not the tool to use for this purpose (yet!), have a look at http://nltk.org/book/ch08.html, there are some quick examples, which would only take you halfway with what you want to achieve, necessitating a lot of pre-processing and smart design.

like image 124
Viktor Vojnovski Avatar answered Sep 28 '22 17:09

Viktor Vojnovski