Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Distinguish matches in pyparsing

I want to parse some words and some numbers with pyparsing. Simple right.

from pyparsing import *

A = Word(nums).setResultsName('A')
B = Word(alphas).setResultsName('B')
expr = OneOrMore(A | B)

result = expr.parseString("123 abc 456 7 d")
print result

The code above prints ['123', 'abc', '456', '7', 'd']. So everything worked. Now I want to do some work with these parsed values. For this task, I need to know if they matched A or B. Is there a way to distinguish between these two.

The only thing what I found after some research was the items method of the ParseResults class. But it only returns [('A', '7'), ('B', 'd')], only the last two matches.

My plan / goal is the following:

for elem in result:
    if elem.is_of_type('A'):
        # do stuff
    elif elem.is_of_type('B'):
        # do something else

How do I distinguish between A and B?

like image 599
Jakube Avatar asked Mar 26 '15 15:03

Jakube


2 Answers

Nice job with getName(). You can also explicitly decorate the returned tokens with a marker, indicating which match was made:

def makeDecoratingParseAction(marker):
    def parse_action_impl(s,l,t):
        return (marker, t[0])
    return parse_action_impl

A = Word(nums).setParseAction(makeDecoratingParseAction("A"))
B = Word(alphas).setParseAction(makeDecoratingParseAction("B"))
expr = OneOrMore(A | B)

result = expr.parseString("123 abc 456 7 d")
print result.asList()

Gives:

[('A', '123'), ('B', 'abc'), ('A', '456'), ('A', '7'), ('B', 'd')]

Now you can iterate over the returned tuples, and each one is labelled with the appropriate marker.

You can take this a step further and use a class to capture both the type and the type-specific post-parse logic, and then pass the class as the expression's parse action. This will create instances of the classes in the returned ParseResults, which you can then execute directly with some sort of exec or doIt method:

class ResultsHandler(object):
    """Define base class to initialize location and tokens.
       Call subclass-specific post_init() if one is defined."""
    def __init__(self, s,locn,tokens):
        self.locn = locn
        self.tokens = tokens
        if hasattr(self, "post_init"):
            self.post_init()

class AHandler(ResultsHandler):
    """Handler for A expressions, which contain a numeric string."""
    def post_init(self):
        self.int_value = int(self.tokens[0])
        self.odd_even = ("EVEN","ODD")[self.int_value % 2]
    def doIt(self):
        print "An A-Type was found at %d with value %d, which is an %s number" % (
                self.locn, self.int_value, self.odd_even)

class BHandler(ResultsHandler):
    """Handler for B expressions, which contain an alphabetic string."""
    def post_init(self):
        self.string = self.tokens[0]
        self.vowels_count = sum(self.string.lower().count(c) for c in "aeiou")
    def doIt(self):
        print "A B-Type was found at %d with value %s, and contains %d vowels" % (
                self.locn, self.string, self.vowels_count)


# pass expression-specific handler classes as parse actions
A = Word(nums).setParseAction(AHandler)
B = Word(alphas).setParseAction(BHandler)
expr = OneOrMore(A | B)

# parse string and run handlers
result = expr.parseString("123 abc 456 7 d")
for handler in result:
    handler.doIt()

Prints:

An A-Type was found at 0 with value 123, which is an ODD number
A B-Type was found at 4 with value abc, and contains 1 vowels
An A-Type was found at 8 with value 456, which is an EVEN number
An A-Type was found at 12 with value 7, which is an ODD number
A B-Type was found at 14 with value d, and contains 0 vowels
like image 99
PaulMcG Avatar answered Sep 22 '22 05:09

PaulMcG


I'm not entirely sure why, but in your .setResultsName() calls, you need to specify listAllMatches=True (it defaults to False). Once you've done that, you can loop over result and check if each token was matched by a given expression by checking for membership in the appropriate sub-thing of result.

from pyparsing import *

#                                    ↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓
A = Word(nums  ).setResultsName('A', listAllMatches=True)
B = Word(alphas).setResultsName('B', listAllMatches=True)
expr = OneOrMore(A | B)

result = expr.parseString("123 abc 456 7 d")

for elem in result:
    if elem in list(result['A']):
        print(elem, 'is in A')
    elif elem in list(result['B']):
        print(elem, 'is in B')

This prints:

123 is in A
abc is in B
456 is in A
7 is in A
d is in B

This is kludgey, and I'm not sure if it's the canonically-correct way of doing this, but it seems to work.

like image 37
senshin Avatar answered Sep 22 '22 05:09

senshin