I have this code which should show the syntactic structure of the sentence according to defined grammar. However it is returning an empty []. What am I missing or doing wrong? <pre class="prettyprint"><code>import nltk grammar = nltk.parse_cfg(""" S -> NP VP PP -> P NP NP -> Det N | Det N PP VP -> V NP | VP PP N -> 'Kim' | 'Dana' | 'everyone' V -> 'arrived' | 'left' |'cheered' P -> 'or' | 'and' """) def main(): sent = "Kim arrived or Dana left and everyone cheered".split() parser = nltk.ChartParser(grammar) trees = parser.nbest_parse(sent) for tree in trees: print tree if __name__ == '__main__': main() </code></pre>

Let's do some reverse engineering: <pre class="prettyprint"><code>>>> import nltk >>> grammar = nltk.parse_cfg(""" ... NP -> Det N | Det N PP ... N -> 'Kim' | 'Dana' | 'everyone' ... """) >>> sent = "Kim".split() >>> parser = nltk.ChartParser(grammar) >>> print parser.nbest_parse(sent) [] </code></pre> Seems like the rules can't recognize even the first work as NP. So let's try injecting <code>NP -> N</code> <pre class="prettyprint"><code>>>> import nltk >>> grammar = nltk.parse_cfg(""" ... NP -> Det N | Det N PP | N ... N -> 'Kim' | 'Dana' | 'everyone' ... """) >>> sent = "Kim".split() >>> parser = nltk.ChartParser(grammar) >>> print parser.nbest_parse(sent) [Tree('NP', [Tree('N', ['Kim'])])] </code></pre> So now it's working, let's continue <code>Kim arrived or Dana and</code>: <pre class="prettyprint"><code>>>> import nltk >>> grammar = nltk.parse_cfg(""" ... S -> NP VP ... PP -> P NP ... NP -> Det N | Det N PP | N ... VP -> V NP | VP PP ... N -> 'Kim' | 'Dana' | 'everyone' ... V -> 'arrived' | 'left' |'cheered' ... P -> 'or' | 'and' ... """) >>> sent = "Kim arrived".split() >>> parser = nltk.ChartParser(grammar) >>> print parser.nbest_parse(sent) [] >>> >>> sent = "Kim arrived or".split() >>> parser = nltk.ChartParser(grammar) >>> print parser.nbest_parse(sent) [] </code></pre> Seem like there is no way to get the <code>VP</code> with or without the <code>P</code>, since <code>V</code> requires either an <code>NP</code> after, or it has to go up the tree to be a <code>VP</code> before taking a <code>P</code>, so it's relax the rules and say <code>VP -> V PP</code> instead of <code>VP -> VP PP</code>: <pre class="prettyprint"><code>>>> import nltk >>> grammar = nltk.parse_cfg(""" ... S -> NP VP ... PP -> P NP ... NP -> Det N | Det N PP | N ... VP -> V NP | V PP ... N -> 'Kim' | 'Dana' | 'everyone' ... V -> 'arrived' | 'left' |'cheered' ... P -> 'or' | 'and' ... """) >>> sent = "Kim arrived or Dana".split() >>> parser = nltk.ChartParser(grammar) >>> print parser.nbest_parse(sent) [Tree('S', [Tree('NP', [Tree('N', ['Kim'])]), Tree('VP', [Tree('V', ['arrived']), Tree('PP', [Tree('P', ['or']), Tree('NP', [Tree('N', ['Dana'])])])])])] </code></pre> Okay, we are getting closer, but seems like the next word broke the cfg rules again: <pre class="prettyprint"><code>>> import nltk >>> grammar = nltk.parse_cfg(""" ... S -> NP VP ... PP -> P NP ... NP -> Det N | Det N PP | N ... VP -> V NP | V PP ... N -> 'Kim' | 'Dana' | 'everyone' ... V -> 'arrived' | 'left' |'cheered' ... P -> 'or' | 'and' ... """) >>> sent = "Kim arrived or Dana left".split() >>> parser = nltk.ChartParser(grammar) >>> print parser.nbest_parse(sent) [] >>> sent = "Kim arrived or Dana left and".split() >>> parser = nltk.ChartParser(grammar) >>> print parser.nbest_parse(sent) [] >>> >>> sent = "Kim arrived or Dana left and everyone".split() >>> parser = nltk.ChartParser(grammar) >>> print parser.nbest_parse(sent) [] >>> >>> sent = "Kim arrived or Dana left and everyone cheered".split() >>> parser = nltk.ChartParser(grammar) >>> print parser.nbest_parse(sent) [] </code></pre> So I hope the above example shows you that trying to change the rules to incorporate language phenomenon from left to right is hard. Instead of doing it from left to right, and achieve <pre class="prettyprint"><code>[[[[[[[[Kim] arrived] or] Dana] left] and] everyone] cheered] </code></pre> why don't you try to make more linguistically sound rules to achieve: <ol> <li><code>[[[Kim arrived] or [Dana left]] and [everyone cheered]]</code></li> <li><code>[[Kim arrived] or [[Dana left] and [everyone cheered]]]</code></li> </ol> Try this instead: <pre class="prettyprint"><code>import nltk grammar = nltk.parse_cfg(""" S -> CP | VP CP -> VP C VP | CP C VP | VP C CP VP -> NP V NP -> 'Kim' | 'Dana' | 'everyone' V -> 'arrived' | 'left' |'cheered' C -> 'or' | 'and' """) print "======= Kim arrived =========" sent = "Kim arrived".split() parser = nltk.ChartParser(grammar) for t in parser.nbest_parse(sent): print t print "\n======= Kim arrived or Dana left =========" sent = "Kim arrived or Dana left".split() parser = nltk.ChartParser(grammar) for t in parser.nbest_parse(sent): print t print "\n=== Kim arrived or Dana left and everyone cheered ====" sent = "Kim arrived or Dana left and everyone cheered".split() parser = nltk.ChartParser(grammar) for t in parser.nbest_parse(sent): print t </code></pre> <code>[out]</code>: <pre class="prettyprint"><code>======= Kim arrived ========= (S (VP (NP Kim) (V arrived))) ======= Kim arrived or Dana left ========= (S (CP (VP (NP Kim) (V arrived)) (C or) (VP (NP Dana) (V left)))) === Kim arrived or Dana left and everyone cheered ==== (S (CP (CP (VP (NP Kim) (V arrived)) (C or) (VP (NP Dana) (V left))) (C and) (VP (NP everyone) (V cheered)))) (S (CP (VP (NP Kim) (V arrived)) (C or) (CP (VP (NP Dana) (V left)) (C and) (VP (NP everyone) (V cheered))))) </code></pre> The above solution show how your CFG rules needs to be robust enough to not only capture the full sentence but also part of the sentence too.

You don't have a <code>Det</code> defined in your grammar, but each <code>NP</code> (and consequently <code>S</code>) has to have one by grammar definition. Compare with <pre class="prettyprint"><code>>>> grammar = nltk.parse_cfg(""" ... S -> NP VP ... NP -> Det N | Det N PP ... VP -> V NP | VP PP ... Det -> 'a' | 'the' ... N -> 'Kim' | 'Dana' | 'everyone' ... V -> 'arrived' | 'left' |'cheered' ... """) >>> >>> parser = nltk.ChartParser(grammar) >>> parser.nbest_parse('the Kim left a Dana'.split()) [Tree('S', [Tree('NP', [Tree('Det', ['the']), Tree('N', ['Kim'])]), Tree('VP', [Tree('V', ['left']), Tree('NP', [Tree('Det', ['a']), Tree('N', ['Dana'])])])])] </code></pre>

Python and NLTK: How to analyze sentence grammar?

Tags:

tree

python-2.7

nlp

nltk

I have this code which should show the syntactic structure of the sentence according to defined grammar. However it is returning an empty []. What am I missing or doing wrong?

Click to copy

import nltk

grammar = nltk.parse_cfg("""
S -> NP VP 
PP -> P NP
NP -> Det N | Det N PP 
VP -> V NP | VP PP
N -> 'Kim' | 'Dana' | 'everyone'
V -> 'arrived' | 'left' |'cheered'
P -> 'or' | 'and'
""")

def main():
    sent = "Kim arrived or Dana left and everyone cheered".split()
    parser = nltk.ChartParser(grammar)
    trees = parser.nbest_parse(sent)
    for tree in trees:
        print tree

if __name__ == '__main__':
    main()

341

asked Jan 07 '14 22:01

Helena

2 Answers

Let's do some reverse engineering:

Click to copy

>>> import nltk
>>> grammar = nltk.parse_cfg("""
... NP -> Det N | Det N PP
... N -> 'Kim' | 'Dana' | 'everyone'
... """)
>>> sent = "Kim".split()
>>> parser = nltk.ChartParser(grammar)
>>> print parser.nbest_parse(sent)
[]

Seems like the rules can't recognize even the first work as NP. So let's try injecting NP -> N

Click to copy

>>> import nltk
>>> grammar = nltk.parse_cfg("""
... NP -> Det N | Det N PP | N
... N -> 'Kim' | 'Dana' | 'everyone'
... """)
>>> sent = "Kim".split()
>>> parser = nltk.ChartParser(grammar)
>>> print parser.nbest_parse(sent)
[Tree('NP', [Tree('N', ['Kim'])])]

So now it's working, let's continue Kim arrived or Dana and:

Click to copy

>>> import nltk
>>> grammar = nltk.parse_cfg("""
... S -> NP VP
... PP -> P NP
... NP -> Det N | Det N PP | N
... VP -> V NP | VP PP
... N -> 'Kim' | 'Dana' | 'everyone'
... V -> 'arrived' | 'left' |'cheered'
... P -> 'or' | 'and'
... """)
>>> sent = "Kim arrived".split()
>>> parser = nltk.ChartParser(grammar)
>>> print parser.nbest_parse(sent)
[]
>>> 
>>> sent = "Kim arrived or".split()
>>> parser = nltk.ChartParser(grammar)
>>> print parser.nbest_parse(sent)
[]

Seem like there is no way to get the VP with or without the P, since V requires either an NP after, or it has to go up the tree to be a VP before taking a P, so it's relax the rules and say VP -> V PP instead of VP -> VP PP:

Click to copy

>>> import nltk
>>> grammar = nltk.parse_cfg("""
... S -> NP VP
... PP -> P NP
... NP -> Det N | Det N PP | N
... VP -> V NP | V PP
... N -> 'Kim' | 'Dana' | 'everyone'
... V -> 'arrived' | 'left' |'cheered'
... P -> 'or' | 'and'
... """)
>>> sent = "Kim arrived or Dana".split()
>>> parser = nltk.ChartParser(grammar)
>>> print parser.nbest_parse(sent)
[Tree('S', [Tree('NP', [Tree('N', ['Kim'])]), Tree('VP', [Tree('V', ['arrived']), Tree('PP', [Tree('P', ['or']), Tree('NP', [Tree('N', ['Dana'])])])])])]

Okay, we are getting closer, but seems like the next word broke the cfg rules again:

Click to copy

>> import nltk
>>> grammar = nltk.parse_cfg("""
... S -> NP VP
... PP -> P NP
... NP -> Det N | Det N PP | N
... VP -> V NP | V PP
... N -> 'Kim' | 'Dana' | 'everyone'
... V -> 'arrived' | 'left' |'cheered'
... P -> 'or' | 'and'
... """)
>>> sent = "Kim arrived or Dana left".split()
>>> parser = nltk.ChartParser(grammar)
>>> print parser.nbest_parse(sent)
[]
>>> sent = "Kim arrived or Dana left and".split()
>>> parser = nltk.ChartParser(grammar)
>>> print parser.nbest_parse(sent)
[]
>>> 
>>> sent = "Kim arrived or Dana left and everyone".split()
>>> parser = nltk.ChartParser(grammar)
>>> print parser.nbest_parse(sent)
[]
>>> 
>>> sent = "Kim arrived or Dana left and everyone cheered".split()
>>> parser = nltk.ChartParser(grammar)
>>> print parser.nbest_parse(sent)
[]

So I hope the above example shows you that trying to change the rules to incorporate language phenomenon from left to right is hard.

Instead of doing it from left to right, and achieve

Click to copy

[[[[[[[[Kim] arrived] or] Dana] left] and] everyone] cheered]

why don't you try to make more linguistically sound rules to achieve:

[[[Kim arrived] or [Dana left]] and [everyone cheered]]
[[Kim arrived] or [[Dana left] and [everyone cheered]]]

Try this instead:

Click to copy

import nltk
grammar = nltk.parse_cfg("""
S -> CP | VP 
CP -> VP C VP | CP C VP | VP C CP
VP -> NP V 
NP -> 'Kim' | 'Dana' | 'everyone'
V -> 'arrived' | 'left' |'cheered'
C -> 'or' | 'and'
""")

print "======= Kim arrived ========="
sent = "Kim arrived".split()
parser = nltk.ChartParser(grammar)
for t in parser.nbest_parse(sent):
    print t

print "\n======= Kim arrived or Dana left ========="
sent = "Kim arrived or Dana left".split()
parser = nltk.ChartParser(grammar)
for t in parser.nbest_parse(sent):
    print t 

print "\n=== Kim arrived or Dana left and everyone cheered ===="
sent = "Kim arrived or Dana left and everyone cheered".split()
parser = nltk.ChartParser(grammar)
for t in parser.nbest_parse(sent):
    print t

[out]:

Click to copy

======= Kim arrived =========
(S (VP (NP Kim) (V arrived)))

======= Kim arrived or Dana left =========
(S (CP (VP (NP Kim) (V arrived)) (C or) (VP (NP Dana) (V left))))

=== Kim arrived or Dana left and everyone cheered ====
(S
  (CP
    (CP (VP (NP Kim) (V arrived)) (C or) (VP (NP Dana) (V left)))
    (C and)
    (VP (NP everyone) (V cheered))))
(S
  (CP
    (VP (NP Kim) (V arrived))
    (C or)
    (CP
      (VP (NP Dana) (V left))
      (C and)
      (VP (NP everyone) (V cheered)))))

The above solution show how your CFG rules needs to be robust enough to not only capture the full sentence but also part of the sentence too.

184

answered Nov 15 '22 12:11

alvas

You don't have a Det defined in your grammar, but each NP (and consequently S) has to have one by grammar definition.

Compare with

Click to copy

>>> grammar = nltk.parse_cfg("""
... S -> NP VP
... NP -> Det N | Det N PP
... VP -> V NP | VP PP
... Det -> 'a' | 'the'
... N -> 'Kim' | 'Dana' | 'everyone'
... V -> 'arrived' | 'left' |'cheered'
... """)
>>>
>>> parser = nltk.ChartParser(grammar)
>>> parser.nbest_parse('the Kim left a Dana'.split())
[Tree('S', [Tree('NP', [Tree('Det', ['the']), Tree('N', ['Kim'])]), Tree('VP', [Tree('V', ['left']), Tree('NP', [Tree('Det', ['a']), Tree('N', ['Dana'])])])])]

answered Nov 15 '22 10:11

alko

Related questions
                            
                                Convert Average of Python List Values to Another List
                            
                                "SyntaxError: Non-ASCII character" in running Python code
                            
                                Python create datetime object from list of values
                            
                                Binary numbers of N digits
                            
                                If Else-if in Robot Framework
                            
                                Error when using classify in caffe
                            
                                Trouble importing tabulate in Python 3.4
                            
                                Parse ½ as 0.5 in Python 2.7
                            
                                Tensorflow embedding_lookup
                            
                                Inserting rows of zeros at specific places along the rows of a NumPy array
                            
                                ord function in python2.7 and python 3.4 are different?
                            
                                Python How to print dictionary in one line?
                            
                                Flask POSTs with Trailing Slash
                            
                                How to catch all exceptions in Try/Catch Block Python?
                            
                                Unable to plot Double Bar, Bar plot using pyplot for ndarray
                            
                                Python Error : No module named pkg_resources
                            
                                Python pickle crash when trying to return default value in __getattr__
                            
                                python how to check list does't contain any value
                            
                                import a file from different directory
                            
                                difference between dict(groupby) and groupby [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python and NLTK: How to analyze sentence grammar?

Tags:

tree

python-2.7

nlp

nltk

Helena

People also ask

2 Answers

alvas

alko

Recent Activity

Donate For Us