Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pyparsing setParseAction function is getting no arguments

I'm trying to parse a simple key = value query language. I've actually accomplished it with a huge monstrosity parser that I then make a second pass through to clean up the parse tree. What I'd like to do is make a clean parse from the bottom up, which includes things like using sets for the (key,val) pairs so redundant pairs are eliminated etc. While I got it working before, I don't feel like I fully understood why pyparsing was acting the way it was, so I did a lot of work arounds etc, sort of fighting against the grain.

Currently, here is the beginning of my "simplified" parser:

from pyparsing import *   

bool_act = lambda t: bool(t[0])
int_act  = lambda t: int(t[0])

def keyval_act(instring, loc, tokens):
    return set([(tokens.k, tokens.v)])

def keyin_act(instring, loc, tokens):
    return set([(tokens.k, set(tokens.vs))])

string = (
      Word(alphas + '_', alphanums + '_')
    | quotedString.setParseAction( removeQuotes )
    )
boolean = (
      CaselessLiteral('true')
    | CaselessLiteral('false')
    )
integer = Word(nums).setParseAction( int_act )
value = (
      boolean.setParseAction(bool_act)
    | integer
    | string
    )
keyval = (string('k') + Suppress('=') + value('v')
          ).setParseAction(keyval_act)
keyin = (
    string('k') + Suppress(CaselessLiteral('in')) +
    nestedExpr('{','}', content = delimitedList(value)('vs'))
    ).setParseAction(keyin_act)

grammar = keyin + stringEnd | keyval + stringEnd

Currently, the "grammar" nonterminal is just a stub, I will eventually add nestable conjunctions and disjunctions to the keys so that searches like this can be parsed:

a = 1, b = 2 , c in {1,2,3} | d = 4, ( e = 5 | e = 2, (f = 3, f = 4))

For now though, I am having trouble understanding how pyparsing calls my setParseAction functions. I know there is some magic in terms of how many arguments are passed, but I am getting an error where no arguments are being passed to the function at all. So currently, if I do:

grammar.parseString('hi in {1,2,3}')

I get this error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.6/site-packages/pyparsing.py", line 1021, in parseString
    loc, tokens = self._parse( instring, 0 )
  File "/usr/lib/python2.6/site-packages/pyparsing.py", line 894, in _parseNoCache
    loc,tokens = self.parseImpl( instring, preloc, doActions )
  File "/usr/lib/python2.6/site-packages/pyparsing.py", line 2478, in parseImpl
    ret = e._parse( instring, loc, doActions )
  File "/usr/lib/python2.6/site-packages/pyparsing.py", line 894, in _parseNoCache
    loc,tokens = self.parseImpl( instring, preloc, doActions )
  File "/usr/lib/python2.6/site-packages/pyparsing.py", line 2351, in parseImpl
    loc, resultlist = self.exprs[0]._parse( instring, loc, doActions, callPreParse=False )
  File "/usr/lib/python2.6/site-packages/pyparsing.py", line 921, in _parseNoCache
    tokens = fn( instring, tokensStart, retTokens )
  File "/usr/lib/python2.6/site-packages/pyparsing.py", line 675, in wrapper
    return func(*args[limit[0]:])
TypeError: keyin_act() takes exactly 3 arguments (0 given)

As you can see from the traceback, I'm using python2.6, and pyparsing 1.5.6

Can anyone give me some insight into why the function isn't getting the right number of arguments?

like image 632
deontologician Avatar asked Apr 16 '12 15:04

deontologician


2 Answers

Well, the latest version of setParseAction does do some extra magic, but unfortunately at the expense of some development simplicity. The argument detection logic in setParseAction now relies on the raising of exceptions in the parse action until it is called with the correct number of arguments, starting at 3 and working its way down to 0, after which it just gives up and raises the exception you saw.

Except in this case, the exception coming from the parse action was not due to an argument list mismatch, but be a real error in your code. To get a better view at this, insert a generic try-except into your parse action:

def keyin_act(instring, loc, tokens): 
    try:
        return set([(tokens.k, set(tokens.vs[0]))]) 
    except Exception as e:
        print e

And you get:

unhashable type: 'set'

In fact, the second element of your list from which you are creating the return set is itself a set, a mutable container, thus not hashable for inclusion in a set. If you change this to use a frozenset instead, then you'll get:

[set([('hi', frozenset([]))])]

Why is the frozenset empty? I suggest you change the location of your results name 'vs' to:

nestedExpr('{','}', content = delimitedList(value))('vs') 

And now the parsed results returned by parsing 'hi in {1,2,3}' are:

[set([('hi', frozenset([([1, 2, 3], {})]))])]

This is something of a mess, if we drop this line at the top of your parse action, you'll see what the different named results actually contain:

print tokens.dump()

We get:

['hi', [1, 2, 3]]
- k: hi
- vs: [[1, 2, 3]]

So 'vs' actually points to a list containing a list. So we probably want to build our set from tokens.vs[0], not tokens.vs. Now our parsed results look like:

[set([('hi', frozenset([1, 2, 3]))])]

Some other tips on your grammar:

  • Instead of CaselessLiteral, try using CaselessKeyword. Keywords are better choice for grammar keywords, since they inherently avoid mistaking the leading 'in' of 'inside' as the keyword 'in' in your grammar.

  • Not sure where you are heading with returning sets from the parse actions - for key-value pairs, a tuple will probably be better, since it will preserve the order of tokens. Build up your sets of keys and values in the after-parsing phase of the program.

  • For other grammar debugging tools, check out setDebug and the traceParseAction decorator.

like image 63
PaulMcG Avatar answered Nov 08 '22 09:11

PaulMcG


Paul has already explained what the root problem is: The TypeError raised by your parse action confuses pyparsing's automagic way of figuring out the number of argument your parse action expects.

Here's what I use to avoid this kind of confusion: A decorator that re-raises any TypeError thrown by the decorated function if the function is called again with fewer arguments:

import functools
import inspect
import sys

def parse_action(f):
    """
    Decorator for pyparsing parse actions to ease debugging.

    pyparsing uses trial & error to deduce the number of arguments a parse
    action accepts. Unfortunately any ``TypeError`` raised by a parse action
    confuses that mechanism.

    This decorator replaces the trial & error mechanism with one based on
    reflection. If the decorated function itself raises a ``TypeError`` then
    that exception is re-raised if the wrapper is called with less arguments
    than required. This makes sure that the actual ``TypeError`` bubbles up
    from the call to the parse action (instead of the one caused by pyparsing's
    trial & error).
    """
    num_args = len(inspect.getargspec(f).args)
    if num_args > 3:
        raise ValueError('Input function must take at most 3 parameters.')

    @functools.wraps(f)
    def action(*args):
        if len(args) < num_args:
            if action.exc_info:
                raise action.exc_info[0], action.exc_info[1], action.exc_info[2]
        action.exc_info = None
        try:
            return f(*args[:-(num_args + 1):-1])
        except TypeError as e:
            action.exc_info = sys.exc_info()
            raise

    action.exc_info = None
    return action

Here's how to use it:

from pyparsing import Literal

@parse_action
def my_parse_action(tokens):
    raise TypeError('Ooops')

x = Literal('x').setParseAction(my_parse_action)
x.parseString('x')

This gives you:

Traceback (most recent call last):
  File "test.py", line 49, in <module>
    x.parseString('x')
  File "/usr/local/lib/python2.7/dist-packages/pyparsing-2.0.2-py2.7.egg/pyparsing.py", line 1101, in parseString
    loc, tokens = self._parse( instring, 0 )
  File "/usr/local/lib/python2.7/dist-packages/pyparsing-2.0.2-py2.7.egg/pyparsing.py", line 1001, in _parseNoCache
    tokens = fn( instring, tokensStart, retTokens )
  File "/usr/local/lib/python2.7/dist-packages/pyparsing-2.0.2-py2.7.egg/pyparsing.py", line 765, in wrapper
    ret = func(*args[limit[0]:])
  File "test.py", line 33, in action
    return f(*args[:num_args])
  File "test.py", line 46, in my_parse_action
    raise TypeError('Ooops')
TypeError: Ooops

Compare this with the traceback that you get without the @parse_action decoration:

Traceback (most recent call last):
  File "test.py", line 49, in <module>
    x.parseString('x')
  File "/usr/local/lib/python2.7/dist-packages/pyparsing-2.0.2-py2.7.egg/pyparsing.py", line 1101, in parseString
    loc, tokens = self._parse( instring, 0 )
  File "/usr/local/lib/python2.7/dist-packages/pyparsing-2.0.2-py2.7.egg/pyparsing.py", line 1001, in _parseNoCache
    tokens = fn( instring, tokensStart, retTokens )
  File "/usr/local/lib/python2.7/dist-packages/pyparsing-2.0.2-py2.7.egg/pyparsing.py", line 765, in wrapper
    ret = func(*args[limit[0]:])
TypeError: my_parse_action() takes exactly 1 argument (0 given)
like image 5
Florian Brucker Avatar answered Nov 08 '22 10:11

Florian Brucker