Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to navigate a nltk.tree.Tree?

Tags:

I've chunked a sentence using:

grammar = '''                                                                                                                   NP:                                                                                                                            {<DT>*(<NN.*>|<JJ.*>)*<NN.*>}                                                                                             NVN:                                                                                                                          {<NP><VB.*><NP>}                                                                                                         ''' chunker = nltk.chunk.RegexpParser(grammar) tree = chunker.parse(tagged) print tree 

The result looks like:

(S   (NVN     (NP The_Pigs/NNS)     are/VBP     (NP a/DT Bristol-based/JJ punk/NN rock/NN band/NN))   that/WDT   formed/VBN   in/IN   1977/CD   ./.) 

But now I'm stuck trying to figure out how to navigate that. I want to be able to find the NVN subtree, and access the left-side noun phrase ("The_Pigs"), the verb ("are") and the right-side noun phrase ("a Bristol-based punk rock band"). How do I do that?

like image 556
Roy Smith Avatar asked Feb 12 '13 21:02

Roy Smith


People also ask

What is NLTK tree tree?

A Tree represents a hierarchical grouping of leaves and subtrees. For example, each constituent in a syntax tree is represented by a single Tree. A tree's children are encoded as a list of leaves and subtrees, where a leaf is a basic (non-tree) value; and a subtree is a nested Tree.

What is parsing in NLTK?

NLTK Parsers. Classes and interfaces for producing tree structures that represent the internal organization of a text. This task is known as “parsing” the text, and the resulting tree structures are called the text's “parses”.


2 Answers

Try:

ROOT = 'ROOT' tree = ... def getNodes(parent):     for node in parent:         if type(node) is nltk.Tree:             if node.label() == ROOT:                 print "======== Sentence ========="                 print "Sentence:", " ".join(node.leaves())             else:                 print "Label:", node.label()                 print "Leaves:", node.leaves()              getNodes(node)         else:             print "Word:", node  getNodes(tree) 
like image 80
Melroy van den Berg Avatar answered Sep 22 '22 17:09

Melroy van den Berg


You could, of course, write your own depth first search... but there is an easier (better) way. If you want every subtree rooted at NVM, use Tree's subtree method with the filter parameter defined.

>>> print t (S     (NVN         (NP The_Pigs/NNS)         are/VBP         (NP a/DT Bristol-based/JJ punk/NN rock/NN band/NN))     that/WDT     formed/VBN     in/IN     1977/CD     ./.) >>> for i in t.subtrees(filter=lambda x: x.node == 'NVN'): ...     print i ...  (NVN     (NP The_Pigs/NNS)     are/VBP     (NP a/DT Bristol-based/JJ punk/NN rock/NN band/NN)) 
like image 39
Peter Enns Avatar answered Sep 22 '22 17:09

Peter Enns