Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Trouble doing simple parse in pyparsing

I'm having some basic problem using pyparsing. Below is the test program and the output of the run.

aaron-mac:sql aaron$ more s.py

from pyparsing import *

n = Word(alphanums)
a = Group( n | Group( n + OneOrMore( Suppress(",") + n )))
p = Group( a + Suppress(".") )
print a.parseString("first")
print a.parseString("first,second")
print p.parseString("first.")
print p.parseString("first,second.")


aaron-mac:sql aaron$ python s.py
[['first']]
[['first']]
[[['first']]]
Traceback (most recent call last):
 File "s.py", line 15, in <module>
   print p.parseString("first,second.")
 File "/Library/Python/2.6/site-packages/pyparsing.py", line 1032, in parseString
   raise exc
pyparsing.ParseException: Expected "." (at char 5), (line:1, col:6)
aaron-mac:sql aaron$ 

How do I modify the grammar in the test program to parse a list of comma separated names terminated by a period? I looked in the docs and tried to find a live support list, but decided I was most likely to get a response here.

like image 547
Aaron Watters Avatar asked Nov 22 '11 22:11

Aaron Watters


2 Answers

The '|' operator creates a MatchFirst expression, in which the alternatives are evaluated until there is a first match.

Pyparsing works purely left-to-right, applying parser expressions to the input string as it can. The only lookahead that pyparsing does is whatever you write into the parser.

In this expression:

a = Group( n | Group( n + OneOrMore( Suppress(",") + n )))

Let's say n is just a literal "X". If this parser was given the input string "X", it would obviously match the leading, lone n expression. If given the string "X,X,X", it would still match just the leading n, because that is the first alternative in the parser.

If you turn the expression around to:

a = Group( Group( n + OneOrMore( Suppress(",") + n )) | n)

then to parse "X" it would first try to match the list, which will fail, and then match the lone n. To parse "X,X,X", the first alternative will be the list expression, which will match.

If you want the longest alternative to match, use the '^' operator, which gives an Or expression. Or will evaluate all the given alternatives, and then select the longest match.

a = Group( n ^ Group( n + OneOrMore( Suppress(",") + n )))

You can also simplify this using the pyparsing helper method delimitedList. Parsing lists of the same expression separated by commas is a common case, so rather than see people have to reinvent expr + ZeroOrMore(Suppress(",") + expr) over and over, I added delimitedList as a standard pyparsing helper. delimitedList("X") would match both "X" and "X,X,X".

like image 153
PaulMcG Avatar answered Nov 15 '22 04:11

PaulMcG


If you just want to cover the case of a comma separated list of names terminated by period you can use the following:

from pyparsing import *
p = Word(alphanums)+ZeroOrMore(Suppress(",")+Word(alphanums))+Suppress(".")

With this you get the following results:

>>> print p.parseString("first.")
['first']
>>> print p.parseString("first,second.")
['first', 'second']

The other examples in your question fail because they don't end with a period.

like image 2
jcollado Avatar answered Nov 15 '22 06:11

jcollado