Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pyparsing newbie setParseAction modifying tokens

I'm new to Pyparsing (and pretty new to Python). I have tried to reduce my problem down to the simplest form that will illustrate what's going wrong (to the point where I probably wouldn't need Pyparsing at all!)

Suppose I've got a string consisting of letters and numbers, such as "b7 z4 a2 d e c3". There's always a letter, but the number is optional. I want to parse this into its individual elements, and then process them, but where there is a bare letter, with no number, it would be handy to change it so that it had the "default" number 1 after it. Then I could process every element in a consistent way. I thought I could do this with a setparseAction, as follows:

from pyparsing import *
teststring = "a2 b5 c9 d e z"
expected_letter = Word("ABCDEFGabcdefgzZxy", exact=1)
expected_number = Word(nums)
letter_and_number = expected_letter + expected_number
bare_letter = expected_letter
bare_letter.setParseAction( lambda s,l,t:  t.append("1") )
elements =  letter_and_number | bare_letter
line = OneOrMore(elements)
print line.parseString(teststring)

Unfortunately, the t.append() doesn't do what I'm expecting, which was to add a "1" to the list of parsed tokens. Instead, I get an error: TypeError: 'str' object is not callable.

I'm probably just being really thick, here, but could one of you experts please set me straight.

Thanks

Steve

like image 285
STEPHEN WEST Avatar asked Dec 01 '12 13:12

STEPHEN WEST


1 Answers

One of the basic concepts to get about pyparsing is that it does not work with just lists of strings, but assembles the parsed pieces into a ParseResults object. ParseResults is a rich data type defined in pyparsing, that can be accessed as a list, or as a dict or object if there are tokens that have been parsed from a ParserElement with a defined results name.

However, while ParseResults was designed with easy access in mind, it is limited in ways it can be updated. Internally in pyparsing, each expression that matches creates a small ParseResults object; if this is part of a large expression, that expression accumulates the pieces into a large ParseResults using the += operator.

In your case, you can append to the ParseResults that is passed in by creating a small ParseResults containing "1" and adding it to t:

t += ParseResults("1")

Unfortunately, this won't work as a lambda - you could try

lambda s,l,t: t.__iadd__(ParseResults("1"))

But this feels a little too clever.

You might also rethink your parser a bit, to take advantage of the Optional class. Think of your trailing digit as an optional element, for which you can define a default value to provide in case the element is missing. I think you can define what you want with just:

>>> letter = Word(alphas,exact=1)
>>> digit = Word(nums,exact=1)
>>> teststring= "a2 b5 c9 d e z"
>>> letter_and_digit = Combine(letter + Optional(digit, default="1"))
>>> print (sum(letter_and_digit.searchString(teststring)))
['a2', 'b5', 'c9', 'd1', 'e1', 'z1']

Combine is used to rejoin the separate letters and digits into strings, otherwise each match would look like ['a','2'], ['b','5'], etc.

(Normally, searchString returns a list of ParseResults objects, which would look like a list of single-element lists. By passing the results of searchString to sum this adds them all into just one ParseResults of strings.)

like image 161
PaulMcG Avatar answered Oct 17 '22 17:10

PaulMcG