I'd like to be able to parse two (or any number) of expressions, each with their own set of variable definitions or other context.
There doesn't seem to be an obvious way to associate a context with a particular invocation of pyparsing.ParseExpression.parseString()
. The most natural way seems to be to use an instancemethod of some class as the parse actions. The problem with this approach is that the grammar must be redefined for each parse context (for instance, in the class's __init__
), which seems terribly inefficient.
Using pyparsing.ParseExpression.copy()
on the rules doesn't help; the individual expressions get cloned alright, but the sub-expressions they are composed from don't get updated in any obvious way, and so none of the parse actions of any nested expression gets invoked.
The only other way I can think of to get this effect would be to define a grammar that returns a context-less abstract parse tree and then processing it in a second step. This seems awkward even for simple grammars: it would be nice to just raise an exception the moment an unrecognized name is used, and it still won't parse languages like C which actually require context about what came before to know which rule matched.
Is there another way of injecting context (without using a global variable, of course) into the parse actions of pyparsing expressions?
A bit late, but googling pyparsing reentrancy
shows this topic, so my answer.
I've solved the issue with parser instance reusing/reentrancy by attaching the context to the string being parsed.
You subclass str
, put your context in an attribute of the new str class,
pass an instance of it to pyparsing
and get the context back in an action.
Python 2.7:
from pyparsing import LineStart, LineEnd, Word, alphas, Optional, Regex, Keyword, OneOrMore
# subclass str; note that unicode is not handled
class SpecStr(str):
context = None # will be set in spec_string() below
# override as pyparsing calls str.expandtabs by default
def expandtabs(self, tabs=8):
ret = type(self)(super(SpecStr, self).expandtabs(tabs))
ret.context = self.context
return ret
# set context here rather than in the constructor
# to avoid messing with str.__new__ and super()
def spec_string(s, context):
ret = SpecStr(s)
ret.context = context
return ret
class Actor(object):
def __init__(self):
self.namespace = {}
def pair_parsed(self, instring, loc, tok):
self.namespace[tok.key] = tok.value
def include_parsed(self, instring, loc, tok):
# doc = open(tok.filename.strip()).read() # would use this line in real life
doc = included_doc # included_doc is defined below
parse(doc, self) # <<<<< recursion
def make_parser(actor_type):
def make_action(fun): # expects fun to be an unbound method of Actor
def action(instring, loc, tok):
if isinstance(instring, SpecStr):
return fun(instring.context, instring, loc, tok)
return None # None as a result of parse actions means
# the tokens has not been changed
return action
# Sample grammar: a sequence of lines,
# each line is either 'key=value' pair or '#include filename'
Ident = Word(alphas)
RestOfLine = Regex('.*')
Pair = (Ident('key') + '=' +
RestOfLine('value')).setParseAction(make_action(actor_type.pair_parsed))
Include = (Keyword('#include') +
RestOfLine('filename')).setParseAction(make_action(actor_type.include_parsed))
Line = (LineStart() + Optional(Pair | Include) + LineEnd())
Document = OneOrMore(Line)
return Document
Parser = make_parser(Actor)
def parse(instring, actor=None):
if actor is not None:
instring = spec_string(instring, actor)
return Parser.parseString(instring)
included_doc = 'parrot=dead'
main_doc = """\
#include included_doc
ham = None
spam = ham"""
# parsing without context is ok
print 'parsed data:', parse(main_doc)
actor = Actor()
parse(main_doc, actor)
print 'resulting namespace:', actor.namespace
yields
['#include', 'included_doc', '\n', 'ham', '=', 'None', '\n', 'spam', '=', 'ham']
{'ham': 'None', 'parrot': 'dead', 'spam': 'ham'}
This approach makes the Parser
itself perfectly reusable and reentrant.
The pyparsing
internals are generally reentrant too, as long as you don't touch ParserElement
's static fields.
The only drawback is that pyparsing
resets its packrat cache on each call to parseString
, but this can be resolved by
overriding SpecStr.__hash__
(to make it hashable like object
, not str
) and some monkeypatching. On my dataset this is not an issue at all as the performance hit is negligible and this even favors memory usage.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With