This may be a silly question, but how does one iterate through a parse tree as an output of an NLP parser (like Stanford NLP)? It's all nested brackets, which is neither an array
nor a dictionary
or any other collection type I've used.
(ROOT\n (S\n (PP (IN As)\n (NP (DT an) (NN accountant)))\n (NP (PRP I))\n (VP (VBP want)\n (S\n (VP (TO to)\n (VP (VB make)\n (NP (DT a) (NN payment))))))))
This particular output format of the Stanford Parser is call the "bracketed parse (tree)". It is supposed to be read as a graph with
ROOT
(In this case you can read it as a Directed Acyclic Graph (DAG) since it's unidirectional and non-cyclic)
There are libraries out there to read bracketed parse, e.g. in NLTK
's nltk.tree.Tree
(http://www.nltk.org/howto/tree.html):
>>> from nltk.tree import Tree
>>> output = '(ROOT (S (PP (IN As) (NP (DT an) (NN accountant))) (NP (PRP I)) (VP (VBP want) (S (VP (TO to) (VP (VB make) (NP (DT a) (NN payment))))))))'
>>> parsetree = Tree.fromstring(output)
>>> print parsetree
(ROOT
(S
(PP (IN As) (NP (DT an) (NN accountant)))
(NP (PRP I))
(VP
(VBP want)
(S (VP (TO to) (VP (VB make) (NP (DT a) (NN payment))))))))
>>> parsetree.pretty_print()
ROOT
|
S
______________________|________
| | VP
| | ________|____
| | | S
| | | |
| | | VP
| | | ________|___
PP | | | VP
___|___ | | | ________|___
| NP NP | | | NP
| ___|______ | | | | ___|_____
IN DT NN PRP VBP TO VB DT NN
| | | | | | | | |
As an accountant I want to make a payment
>>> parsetree.leaves()
['As', 'an', 'accountant', 'I', 'want', 'to', 'make', 'a', 'payment']
Note that if you're interested in specific nodes in the tree, identified by regex-like rules, you can use this very, very hand class to extract all such nodes using a regex-like matcher:
http://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/trees/tregex/TregexPattern.html
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With