I've chunked a sentence using:
grammar = ''' NP: {<DT>*(<NN.*>|<JJ.*>)*<NN.*>} NVN: {<NP><VB.*><NP>} ''' chunker = nltk.chunk.RegexpParser(grammar) tree = chunker.parse(tagged) print tree
The result looks like:
(S (NVN (NP The_Pigs/NNS) are/VBP (NP a/DT Bristol-based/JJ punk/NN rock/NN band/NN)) that/WDT formed/VBN in/IN 1977/CD ./.)
But now I'm stuck trying to figure out how to navigate that. I want to be able to find the NVN subtree, and access the left-side noun phrase ("The_Pigs"), the verb ("are") and the right-side noun phrase ("a Bristol-based punk rock band"). How do I do that?
A Tree represents a hierarchical grouping of leaves and subtrees. For example, each constituent in a syntax tree is represented by a single Tree. A tree's children are encoded as a list of leaves and subtrees, where a leaf is a basic (non-tree) value; and a subtree is a nested Tree.
NLTK Parsers. Classes and interfaces for producing tree structures that represent the internal organization of a text. This task is known as “parsing” the text, and the resulting tree structures are called the text's “parses”.
Try:
ROOT = 'ROOT' tree = ... def getNodes(parent): for node in parent: if type(node) is nltk.Tree: if node.label() == ROOT: print "======== Sentence =========" print "Sentence:", " ".join(node.leaves()) else: print "Label:", node.label() print "Leaves:", node.leaves() getNodes(node) else: print "Word:", node getNodes(tree)
You could, of course, write your own depth first search... but there is an easier (better) way. If you want every subtree rooted at NVM, use Tree's subtree method with the filter parameter defined.
>>> print t (S (NVN (NP The_Pigs/NNS) are/VBP (NP a/DT Bristol-based/JJ punk/NN rock/NN band/NN)) that/WDT formed/VBN in/IN 1977/CD ./.) >>> for i in t.subtrees(filter=lambda x: x.node == 'NVN'): ... print i ... (NVN (NP The_Pigs/NNS) are/VBP (NP a/DT Bristol-based/JJ punk/NN rock/NN band/NN))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With