I'm new to Python, nltk and nlp. I have written simple grammar. But when running the program it gives below error. Please help me to solve this error
Grammar:-
S -> NP
NP -> PN|PRO|D[NUM=?n] N[NUM=?n]|D[NUM=?n] A N[NUM=?n]|D[NUM=?n] N[NUM=?n] PP|QP N[NUM=?n]|A N[NUM=?n]|D[NUM=?n] NOM PP|D[NUM=?n] NOM
PP -> P NP
D[NUM=sg] -> 'a'
D -> 'the'
N[NUM=sg] -> 'boy'|'girl'|'room'|'garden'|'hair'
N[NUM=pl] -> 'dogs'|'cats'
PN -> 'saumya'|'dinesh'
PRO -> 'she'|'he'|'we'
A -> 'tall'|'naughty'|'long'|'three'|'black'
P -> 'with'|'in'|'from'|'at'
QP -> 'some'
NOM -> A NOM|N[NUM=?n]
Code:-
import nltk
grammar = nltk.data.load('file:english_grammer.cfg')
rdparser = nltk.RecursiveDescentParser(grammar)
sent = "a dogs".split()
trees = rdparser.parse(sent)
for tree in trees: print (tree)
Error:-
ValueError: Expected a nonterminal, found: [NUM=?n] N[NUM=?n]|D[NUM=?n] A N[NUM=?n]|D[NUM=?n] N[NUM=?n] PP|QP N[NUM=?n]|A N[NUM=?n]|D[NUM=?n] NOM PP|D[NUM=?n] NOM
I don't think NLTK CFG grammar readers can read the format of your CFG with square brackets.
First let's try a CFG grammar without the square brackets:
from nltk.grammar import CFG
grammar_string = '''
S -> NP
PP -> P NP
D -> 'the'
PN -> 'saumya'|'dinesh'
PRO -> 'she'|'he'|'we'
A -> 'tall'|'naughty'|'long'|'three'|'black'
P -> 'with'|'in'|'from'|'at'
QP -> 'some'
'''
grammar = CFG.fromstring(grammar_string)
print grammar
[out]:
Grammar with 18 productions (start state = S)
S -> NP
PP -> P NP
D -> 'the'
PN -> 'saumya'
PN -> 'dinesh'
PRO -> 'she'
PRO -> 'he'
PRO -> 'we'
A -> 'tall'
A -> 'naughty'
A -> 'long'
A -> 'three'
A -> 'black'
P -> 'with'
P -> 'in'
P -> 'from'
P -> 'at'
QP -> 'some'
Now let's put the square brackets in:
from nltk.grammar import CFG
grammar_string = '''
S -> NP
PP -> P NP
D -> 'the'
PN -> 'saumya'|'dinesh'
PRO -> 'she'|'he'|'we'
A -> 'tall'|'naughty'|'long'|'three'|'black'
P -> 'with'|'in'|'from'|'at'
QP -> 'some'
N[NUM=sg] -> 'boy'|'girl'|'room'|'garden'|'hair'
N[NUM=pl] -> 'dogs'|'cats'
'''
grammar = CFG.fromstring(grammar_string)
print grammar
[out]:
Traceback (most recent call last):
File "test.py", line 33, in <module>
grammar = CFG.fromstring(grammar_string)
File "/usr/local/lib/python2.7/dist-packages/nltk/grammar.py", line 519, in fromstring
encoding=encoding)
File "/usr/local/lib/python2.7/dist-packages/nltk/grammar.py", line 1273, in read_grammar
(linenum+1, line, e))
ValueError: Unable to parse line 10: N[NUM=sg] -> 'boy'|'girl'|'room'|'garden'|'hair'
Expected an arrow
Going back to your grammar, it seems like you're using the square brackets to denote constraints or uncontraints, so the solution would be:
So your cfg rules will look as such:
from nltk.parse import RecursiveDescentParser
from nltk.grammar import CFG
grammar_string = '''
S -> NP
NP -> PN | PRO | D N | D A N | D N PP | QP N | A N | D NOM PP | D NOM
PP -> P NP
PN -> 'saumya'|'dinesh'
PRO -> 'she'|'he'|'we'
A -> 'tall'|'naughty'|'long'|'three'|'black'
P -> 'with'|'in'|'from'|'at'
QP -> 'some'
D -> D_def | D_sg
D_def -> 'the'
D_sg -> 'a'
N -> N_sg | N_pl
N_sg -> 'boy'|'girl'|'room'|'garden'|'hair'
N_pl -> 'dogs'|'cats'
'''
grammar = CFG.fromstring(grammar_string)
rdparser = RecursiveDescentParser(grammar)
sent = "a dogs".split()
trees = rdparser.parse(sent)
for tree in trees:
print (tree)
[out]:
(S (NP (D (D_sg a)) (N (N_pl dogs))))
It looks like you're trying to use NLTK's feature grammars, which do use the square bracket syntax to denote features and feature agreement. NLTK's parser to use feature grammars is the FeatureEarleyChartParser (as opposed to RecursiveDescentParser).
From the NLTK documentation:
>>> from __future__ import print_function
>>> import nltk
>>> from nltk import grammar, parse
>>> g = """
... % start DP
... DP[AGR=?a] -> D[AGR=?a] N[AGR=?a]
... D[AGR=[NUM='sg', PERS=3]] -> 'this' | 'that'
... D[AGR=[NUM='pl', PERS=3]] -> 'these' | 'those'
... D[AGR=[NUM='pl', PERS=1]] -> 'we'
... D[AGR=[PERS=2]] -> 'you'
... N[AGR=[NUM='sg', GND='m']] -> 'boy'
... N[AGR=[NUM='pl', GND='m']] -> 'boys'
... N[AGR=[NUM='sg', GND='f']] -> 'girl'
... N[AGR=[NUM='pl', GND='f']] -> 'girls'
... N[AGR=[NUM='sg']] -> 'student'
... N[AGR=[NUM='pl']] -> 'students'
... """
>>> grammar = grammar.FeatureGrammar.fromstring(g)
>>> tokens = 'these girls'.split()
>>> parser = parse.FeatureEarleyChartParser(grammar)
>>> trees = parser.parse(tokens)
>>> for tree in trees: print(tree)
(DP[AGR=[GND='f', NUM='pl', PERS=3]]
(D[AGR=[NUM='pl', PERS=3]] these)
(N[AGR=[GND='f', NUM='pl']] girls))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With