Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Grammar rule extraction from parsed result

I get following result when i execute stanford parser from nltk.

(S (VP (VB get) (NP (PRP me)) (ADVP (RB now))))

but i need it in the form

S -> VP
VP -> VB NP ADVP
VB -> get
PRP -> me
RB -> now

How can I get this result, perhaps using recursive function. Is there in-built function already?

like image 246
aman Avatar asked Oct 15 '15 05:10

aman


1 Answers

First to navigate a tree, see How to iterate through all nodes of a tree? and How to navigate a nltk.tree.Tree? :

>>> from nltk.tree import Tree
>>> bracket_parse = "(S (VP (VB get) (NP (PRP me)) (ADVP (RB now))))"
>>> ptree = Tree.fromstring(bracket_parse)
>>> ptree
Tree('S', [Tree('VP', [Tree('VB', ['get']), Tree('NP', [Tree('PRP', ['me'])]), Tree('ADVP', [Tree('RB', ['now'])])])])
>>> for subtree in ptree.subtrees():
...     print subtree
... 
(S (VP (VB get) (NP (PRP me)) (ADVP (RB now))))
(VP (VB get) (NP (PRP me)) (ADVP (RB now)))
(VB get)
(NP (PRP me))
(PRP me)
(ADVP (RB now))
(RB now)

And what you're looking for is https://github.com/nltk/nltk/blob/develop/nltk/tree.py#L341:

>>> ptree.productions()
[S -> VP, VP -> VB NP ADVP, VB -> 'get', NP -> PRP, PRP -> 'me', ADVP -> RB, RB -> 'now']

Note that Tree.productions() returns a Production object, see https://github.com/nltk/nltk/blob/develop/nltk/tree.py#L22 and https://github.com/nltk/nltk/blob/develop/nltk/grammar.py#L236.

If you want a string form of the grammar rules, you can either do:

>>> for rule in ptree.productions():
...     print rule
... 
S -> VP
VP -> VB NP ADVP
VB -> 'get'
NP -> PRP
PRP -> 'me'
ADVP -> RB
RB -> 'now'

Or

>>> rules = [str(p) for p in ptree.productions()]
>>> rules
['S -> VP', 'VP -> VB NP ADVP', "VB -> 'get'", 'NP -> PRP', "PRP -> 'me'", 'ADVP -> RB', "RB -> 'now'"]
like image 129
alvas Avatar answered Sep 25 '22 16:09

alvas