Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to iterate over ParseResults in pyparsing

I have the following test code for my PyParsing grammar:

from pyparsing import Word, nums, alphas, delimitedList, Group, oneOf
from pprint import pprint

field = Word(alphas)("field")
operator = oneOf("= +")("operator")
string_value = Word(alphas)("string")
int_value = Word(nums).setParseAction(lambda t: int(t[0]))("int")
value = (string_value | int_value )("value")
expression = Group(field + operator + value)("expression")
grammar = Group(delimitedList(expression, delim="&&"))("expr_list")

def test(s):
    print "Parsing '{0}'".format(s)
    tokenized = grammar.parseString(s)
    for f in tokenized:
        e = f.expression
        pprint(dict(e.items()))

if __name__ == "__main__":
    test("foo=1")
    test("foo=1 && bar=2")
    test("foobar=2 && snakes=4")

Output is quite unexpected - seems that I only get the last expression in tokenized:

Parsing 'foo=1'
{'field': 'foo', 'int': 1, 'operator': '=', 'value': 1}
Parsing 'foo=1 && bar=2'
{'field': 'bar', 'int': 2, 'operator': '=', 'value': 2}
Parsing 'foobar=2 && snakes=4'
{'field': 'snakes', 'int': 4, 'operator': '=', 'value': 4}

How do I fix this?

like image 989
Kimvais Avatar asked Oct 07 '22 23:10

Kimvais


1 Answers

Untested, but I think you just need to change:

expression = (field + operator + value)("expression")

to:

expression = Group(field + operator + value)("expression")

EDIT: okay, one other change. Your iteration code looks for multiple items named 'expression'. There are multiple items named 'expression' inside the '&&'-delimited list. It is simpler not to reference these by their name, but by iterating over the grouped expressions inside 'expr_list':

for f in tokenized['expr_list']: 
    field = f['field']
    op = f['operator']
    value = f['value']
    print field, op, value

I usually use the dump method on parsed results to see just how the data has been grouped and named. If I print out tokenized.dump() I get:

[[['foo', '=', 1], ['bar', '=', 2]]]
- expr_list: [['foo', '=', 1], ['bar', '=', 2]]
  - expression: ['bar', '=', 2]
    - field: bar
    - int: 2
    - operator: =
    - value: 2

I can see that I can get at the 'expr_list' named value. I also see that there is a sub-level 'expression', but as these keys are by default unique like in a dict, there is only a value for the group that was parsed last. But I can access the multiple groups inside 'expr_list' - if I look at the 0'th item (using print tokenized['expr_list'][0].dump()), I get:

['foo', '=', 1]
- field: foo
- int: 1
- operator: =
- value: 1

So I can iterate over the groups in the 'expr_list' using:

for f in tokenized['expr_list']: 
    field = f['field']
    op = f['operator']
    value = f['value']
    print field, op, value

and I'll get:

foo = 1
bar = 2

It isn't necessary to put results names on every level within your grammar - in this case, we got the expressions by iterating through expr_list and didn't even use expression. And in fact, if you take the Group of the outermost grammar expression, you don't need 'expr_list' either, just iterate for f in tokenized:.

When trying to tease out the contents of your returned ParseResults, the dump method is probably the best tool.

like image 153
PaulMcG Avatar answered Oct 21 '22 02:10

PaulMcG