Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Simple grammar give ValueError in Python

I'm new to Python, nltk and nlp. I have written simple grammar. But when running the program it gives below error. Please help me to solve this error

Grammar:-

S -> NP
NP -> PN|PRO|D[NUM=?n] N[NUM=?n]|D[NUM=?n] A N[NUM=?n]|D[NUM=?n] N[NUM=?n] PP|QP N[NUM=?n]|A N[NUM=?n]|D[NUM=?n] NOM PP|D[NUM=?n] NOM
PP -> P NP
D[NUM=sg] -> 'a'
D -> 'the'
N[NUM=sg] -> 'boy'|'girl'|'room'|'garden'|'hair'
N[NUM=pl] -> 'dogs'|'cats'
PN -> 'saumya'|'dinesh'
PRO -> 'she'|'he'|'we'
A -> 'tall'|'naughty'|'long'|'three'|'black'
P -> 'with'|'in'|'from'|'at'
QP -> 'some'
NOM -> A NOM|N[NUM=?n]

Code:-

import nltk

grammar = nltk.data.load('file:english_grammer.cfg')
rdparser = nltk.RecursiveDescentParser(grammar)
sent = "a dogs".split()
trees = rdparser.parse(sent)

for tree in trees: print (tree)

Error:-

ValueError: Expected a nonterminal, found: [NUM=?n] N[NUM=?n]|D[NUM=?n] A N[NUM=?n]|D[NUM=?n] N[NUM=?n] PP|QP N[NUM=?n]|A N[NUM=?n]|D[NUM=?n] NOM PP|D[NUM=?n] NOM

like image 811
Chandana Indisooriya Avatar asked Feb 11 '23 18:02

Chandana Indisooriya


2 Answers

I don't think NLTK CFG grammar readers can read the format of your CFG with square brackets.

First let's try a CFG grammar without the square brackets:

from nltk.grammar import CFG

grammar_string = '''
S -> NP
PP -> P NP
D -> 'the'
PN -> 'saumya'|'dinesh'
PRO -> 'she'|'he'|'we'
A -> 'tall'|'naughty'|'long'|'three'|'black'
P -> 'with'|'in'|'from'|'at'
QP -> 'some'
'''

grammar = CFG.fromstring(grammar_string)
print grammar

[out]:

Grammar with 18 productions (start state = S)
    S -> NP
    PP -> P NP
    D -> 'the'
    PN -> 'saumya'
    PN -> 'dinesh'
    PRO -> 'she'
    PRO -> 'he'
    PRO -> 'we'
    A -> 'tall'
    A -> 'naughty'
    A -> 'long'
    A -> 'three'
    A -> 'black'
    P -> 'with'
    P -> 'in'
    P -> 'from'
    P -> 'at'
    QP -> 'some'

Now let's put the square brackets in:

from nltk.grammar import CFG

grammar_string = '''
S -> NP
PP -> P NP
D -> 'the'
PN -> 'saumya'|'dinesh'
PRO -> 'she'|'he'|'we'
A -> 'tall'|'naughty'|'long'|'three'|'black'
P -> 'with'|'in'|'from'|'at'
QP -> 'some'
N[NUM=sg] -> 'boy'|'girl'|'room'|'garden'|'hair'
N[NUM=pl] -> 'dogs'|'cats'
'''

grammar = CFG.fromstring(grammar_string)
print grammar

[out]:

Traceback (most recent call last):
  File "test.py", line 33, in <module>
    grammar = CFG.fromstring(grammar_string)
  File "/usr/local/lib/python2.7/dist-packages/nltk/grammar.py", line 519, in fromstring
    encoding=encoding)
  File "/usr/local/lib/python2.7/dist-packages/nltk/grammar.py", line 1273, in read_grammar
    (linenum+1, line, e))
ValueError: Unable to parse line 10: N[NUM=sg] -> 'boy'|'girl'|'room'|'garden'|'hair'
Expected an arrow

Going back to your grammar, it seems like you're using the square brackets to denote constraints or uncontraints, so the solution would be:

  • Using underscore for contrainted non-terminals and
  • to make a rule for unconstrainted non-terminals

So your cfg rules will look as such:

from nltk.parse import RecursiveDescentParser
from nltk.grammar import CFG

grammar_string = '''
S -> NP
NP -> PN | PRO | D N | D A N | D N PP | QP N | A N | D NOM PP | D NOM

PP -> P NP
PN -> 'saumya'|'dinesh'
PRO -> 'she'|'he'|'we'
A -> 'tall'|'naughty'|'long'|'three'|'black'
P -> 'with'|'in'|'from'|'at'
QP -> 'some'

D -> D_def | D_sg
D_def -> 'the'
D_sg -> 'a'

N -> N_sg | N_pl
N_sg -> 'boy'|'girl'|'room'|'garden'|'hair'
N_pl -> 'dogs'|'cats'
'''

grammar = CFG.fromstring(grammar_string)

rdparser = RecursiveDescentParser(grammar)
sent = "a dogs".split()
trees = rdparser.parse(sent)

for tree in trees:
    print (tree)

[out]:

(S (NP (D (D_sg a)) (N (N_pl dogs))))
like image 63
alvas Avatar answered Mar 05 '23 03:03

alvas


It looks like you're trying to use NLTK's feature grammars, which do use the square bracket syntax to denote features and feature agreement. NLTK's parser to use feature grammars is the FeatureEarleyChartParser (as opposed to RecursiveDescentParser).

From the NLTK documentation:

>>> from __future__ import print_function
>>> import nltk
>>> from nltk import grammar, parse
>>> g = """
... % start DP
... DP[AGR=?a] -> D[AGR=?a] N[AGR=?a]
... D[AGR=[NUM='sg', PERS=3]] -> 'this' | 'that'
... D[AGR=[NUM='pl', PERS=3]] -> 'these' | 'those'
... D[AGR=[NUM='pl', PERS=1]] -> 'we'
... D[AGR=[PERS=2]] -> 'you'
... N[AGR=[NUM='sg', GND='m']] -> 'boy'
... N[AGR=[NUM='pl', GND='m']] -> 'boys'
... N[AGR=[NUM='sg', GND='f']] -> 'girl'
... N[AGR=[NUM='pl', GND='f']] -> 'girls'
... N[AGR=[NUM='sg']] -> 'student'
... N[AGR=[NUM='pl']] -> 'students'
... """
>>> grammar = grammar.FeatureGrammar.fromstring(g)
>>> tokens = 'these girls'.split()
>>> parser = parse.FeatureEarleyChartParser(grammar)
>>> trees = parser.parse(tokens)
>>> for tree in trees: print(tree)
(DP[AGR=[GND='f', NUM='pl', PERS=3]]
  (D[AGR=[NUM='pl', PERS=3]] these)
  (N[AGR=[GND='f', NUM='pl']] girls))
like image 45
dantiston Avatar answered Mar 05 '23 03:03

dantiston