Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pyparsing.ParseException when using parseString (searchString works)

I'm trying to parse some Traffic Violation sentences using pyparsing, when I use grammar.searchString(sentence) it is ok, but when I use parseString a ParseException is thrown. Can anybody help me please saying what is wrong with my code?

from pyparsing import Or, Literal, oneOf, OneOrMore, nums, alphas, Regex, Word, \
    SkipTo, LineEnd, originalTextFor, Optional, ZeroOrMore, Keyword, Group
import pyparsing as pp

from nltk.tag import pos_tag

sentences = ['Failure to control vehicle speed on highway to avoid collision','Failure to stop at stop sign', 'Introducing additives into special fuel by unauthorized person and contrary to regulations', 'driver fail to stop at yield sign at nearest pointf approaching traffic view when req. for safety', 'Operating unregistered motor vehicle on highway', 'Exceeding maximum speed: 39 MPH in a posted 30 MPH zone']


for sentence in sentences:
    words = pos_tag(sentence.split())
    #print words
    verbs = [word for word, pos in words if pos in ['VB','VBD','VBG']]
    nouns = [word for word, pos in words if pos == 'NN']
    adjectives = [word for word, pos in words if pos == 'JJ']

    adjectives.append('great')  # initializing  
    verbs.append('get') # initializing 


    object_generator = oneOf('for to')
    location_generator = oneOf('at in into on onto over within')
    speed_generator = oneOf('MPH KM/H')

    noun = oneOf(nouns)
    adjective = oneOf(adjectives)

    location = location_generator + pp.Group(Optional(adjective) + noun)

    action = oneOf(verbs)
    speed = Word(nums) + speed_generator

    grammar =  action | location | speed

    parsed = grammar.parseString(sentence)

    print parsed

Error traceback

Traceback (most recent call last): File "script3.py", line 35, in parsed = grammar.parseString(sentence) File "/Users/alana/anaconda/lib/python2.7/site-packages/pyparsing‌​.py", line 1032, in parseString raise exc pyparsing.ParseException: Expected Re:('control|avoid|get') (at char 0), (line:1, col:1)

like image 746
Alana Oliveira Avatar asked Oct 17 '22 22:10

Alana Oliveira


1 Answers

searchString is working because it skips over text that doesn't exactly match the grammar. parseString is much more particular, requiring a complete grammar match, beginning right with the first character of the input string. In your case, the grammar is a little difficult to determine, since it is auto-generated based on the NLTK analysis of the input sentence (an interesting approach, btw). If you just print the grammar itself, it may give you some insights into what strings it is looking for. For instance, I'm guessing NLTK will interpret 'Failure' in your first example as a noun, yet none of your 3 expressions in your grammar starts with a noun - therefore, parseString will fail.

You'll probably need to do a lot more internal printing of noun, adjective, and verb lists based on what NLTK finds, and then see how that maps to your generated grammar.

You can also try to combine the results of multiple matches in the sentence using Python's sum() builtin:

grammar =  action("action") | Group(location)("location") | Group(speed)("speed")

#parsed = grammar.parseString(sentence)
parsed = sum(grammar.searchString(sentence))
print(parsed.dump())
like image 155
PaulMcG Avatar answered Oct 21 '22 02:10

PaulMcG