I have the following test code for my PyParsing grammar:
from pyparsing import Word, nums, alphas, delimitedList, Group, oneOf
from pprint import pprint
field = Word(alphas)("field")
operator = oneOf("= +")("operator")
string_value = Word(alphas)("string")
int_value = Word(nums).setParseAction(lambda t: int(t[0]))("int")
value = (string_value | int_value )("value")
expression = Group(field + operator + value)("expression")
grammar = Group(delimitedList(expression, delim="&&"))("expr_list")
def test(s):
print "Parsing '{0}'".format(s)
tokenized = grammar.parseString(s)
for f in tokenized:
e = f.expression
pprint(dict(e.items()))
if __name__ == "__main__":
test("foo=1")
test("foo=1 && bar=2")
test("foobar=2 && snakes=4")
Output is quite unexpected - seems that I only get the last expression in tokenized
:
Parsing 'foo=1'
{'field': 'foo', 'int': 1, 'operator': '=', 'value': 1}
Parsing 'foo=1 && bar=2'
{'field': 'bar', 'int': 2, 'operator': '=', 'value': 2}
Parsing 'foobar=2 && snakes=4'
{'field': 'snakes', 'int': 4, 'operator': '=', 'value': 4}
How do I fix this?
Untested, but I think you just need to change:
expression = (field + operator + value)("expression")
to:
expression = Group(field + operator + value)("expression")
EDIT: okay, one other change. Your iteration code looks for multiple items named 'expression'. There are multiple items named 'expression' inside the '&&'-delimited list. It is simpler not to reference these by their name, but by iterating over the grouped expressions inside 'expr_list':
for f in tokenized['expr_list']:
field = f['field']
op = f['operator']
value = f['value']
print field, op, value
I usually use the dump
method on parsed results to see just how the data has been grouped and named. If I print out tokenized.dump()
I get:
[[['foo', '=', 1], ['bar', '=', 2]]]
- expr_list: [['foo', '=', 1], ['bar', '=', 2]]
- expression: ['bar', '=', 2]
- field: bar
- int: 2
- operator: =
- value: 2
I can see that I can get at the 'expr_list' named value. I also see that there is a sub-level 'expression', but as these keys are by default unique like in a dict, there is only a value for the group that was parsed last. But I can access the multiple groups inside 'expr_list' - if I look at the 0'th item (using print tokenized['expr_list'][0].dump()
), I get:
['foo', '=', 1]
- field: foo
- int: 1
- operator: =
- value: 1
So I can iterate over the groups in the 'expr_list' using:
for f in tokenized['expr_list']:
field = f['field']
op = f['operator']
value = f['value']
print field, op, value
and I'll get:
foo = 1
bar = 2
It isn't necessary to put results names on every level within your grammar - in this case, we got the expressions by iterating through expr_list
and didn't even use expression
. And in fact, if you take the Group of the outermost grammar expression, you don't need 'expr_list' either, just iterate for f in tokenized:
.
When trying to tease out the contents of your returned ParseResults, the dump
method is probably the best tool.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With