I'm having some basic problem using pyparsing. Below is the test program and the output of the run.
aaron-mac:sql aaron$ more s.py
from pyparsing import *
n = Word(alphanums)
a = Group( n | Group( n + OneOrMore( Suppress(",") + n )))
p = Group( a + Suppress(".") )
print a.parseString("first")
print a.parseString("first,second")
print p.parseString("first.")
print p.parseString("first,second.")
aaron-mac:sql aaron$ python s.py
[['first']]
[['first']]
[[['first']]]
Traceback (most recent call last):
File "s.py", line 15, in <module>
print p.parseString("first,second.")
File "/Library/Python/2.6/site-packages/pyparsing.py", line 1032, in parseString
raise exc
pyparsing.ParseException: Expected "." (at char 5), (line:1, col:6)
aaron-mac:sql aaron$
How do I modify the grammar in the test program to parse a list of comma separated names terminated by a period? I looked in the docs and tried to find a live support list, but decided I was most likely to get a response here.
The '|' operator creates a MatchFirst expression, in which the alternatives are evaluated until there is a first match.
Pyparsing works purely left-to-right, applying parser expressions to the input string as it can. The only lookahead that pyparsing does is whatever you write into the parser.
In this expression:
a = Group( n | Group( n + OneOrMore( Suppress(",") + n )))
Let's say n
is just a literal "X". If this parser was given the input string "X", it would obviously match the leading, lone n
expression. If given the string "X,X,X", it would still match just the leading n
, because that is the first alternative in the parser.
If you turn the expression around to:
a = Group( Group( n + OneOrMore( Suppress(",") + n )) | n)
then to parse "X" it would first try to match the list, which will fail, and then match the lone n
. To parse "X,X,X", the first alternative will be the list expression, which will match.
If you want the longest alternative to match, use the '^' operator, which gives an Or expression. Or will evaluate all the given alternatives, and then select the longest match.
a = Group( n ^ Group( n + OneOrMore( Suppress(",") + n )))
You can also simplify this using the pyparsing helper method delimitedList
. Parsing lists of the same expression separated by commas is a common case, so rather than see people have to reinvent expr + ZeroOrMore(Suppress(",") + expr)
over and over, I added delimitedList
as a standard pyparsing helper. delimitedList("X")
would match both "X" and "X,X,X".
If you just want to cover the case of a comma separated list of names terminated by period you can use the following:
from pyparsing import *
p = Word(alphanums)+ZeroOrMore(Suppress(",")+Word(alphanums))+Suppress(".")
With this you get the following results:
>>> print p.parseString("first.")
['first']
>>> print p.parseString("first,second.")
['first', 'second']
The other examples in your question fail because they don't end with a period.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With