Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do you parse node and node relationships in pyparsing?

I've built a raw parser but I'd really like to get this working in pyparsing.

there's two types of strings I'd like to parse. One that parses just nodes and the second node relationships

verb node1, node2, ... 

and

verb node1->node2->node3

you can specify one or more nodes, which can be quoted additionally you can indicate that a node is inside another node by adding a ^

verb node1, node2 ^ node3, node4

You may also want to indicate node relationships by using a ->, <- or <-> indicator.

verb node1->node2<->node3

Again you can indicate that a node is inside another by using a ^

verb node1->node2^node4<->node3
like image 619
Marinus Avatar asked Jan 28 '26 11:01

Marinus


1 Answers

A conceptual BNF for this format would look like:

node :: word composed of alphas, digits, '_'
verb :: one of several defined keywords
binop :: '->' | '<-' | '<->'
nodeFactor :: node '^' node | node
nodeExpr :: nodeFactor op nodeFactor
nodeCommand :: verb nodeExpr [',' nodeExpr]...

This maps to pyparsing almost step for step:

from pyparsing import (Word,alphas,alphanums,Keyword,
    infixNotation,opAssoc,oneOf,delimitedList)

nodeRef = Word(alphas,alphanums+'_')
GO, TURN, FOLLOW = map(Keyword, "GO TURN FOLLOW".split())
verb = GO | TURN | FOLLOW
binop = oneOf('-> <- <->')

The next part is most easily implemented using pyparsing's infixNotation method (formerly known as operatorPrecedence). infixNotation allows us to define a hierarchy of operations, and will group the parsed output according to the precedence defined by the hierarchy. I'm assuming that your '^' "is inside" operator should be evaluated before the binary '->', etc. operators. infixNotation also allows for nesting within parentheses, but none of your examples show this as absolutely needed. You define the infixNotation by specifying the base operand type, followed by a list of 3-tuples, each showing the operator, a value 1,2 or 3 for unary, binary or ternary operators, and the constant opAssoc.LEFT or RIGHT for left or right associativity of the operator:

nodeExpr = infixNotation(nodeRef,
    [
    ('^', 2, opAssoc.LEFT),
    (binop, 2, opAssoc.LEFT),
    ])

Finally, we define the overall expression, which I've interpreted as some sort of command. The comma-separated list of node expressions could be directly implemented as nodeExpr + ZeroOrMore(Suppress(',') + nodeExpr) (we suppress the commas from the parsed output - they are useful at parse time, but afterwards we just have to skip over them). But this comes up so often, pyparsing offers the method delimitedList:

nodeCommand = verb('verb') + delimitedList(nodeExpr)('nodes')

The names 'verb' and 'nodes' cause the results parsed in the respective expressions to be associated with those names, which will make it easier to work with the parsed data after the parsing is done.

Now to test the parser:

tests = """\
    GO node1,node2
    TURN node1->node2->node3
    GO node1,node2^node3,node4
    FOLLOW node1->node2<->node3
    GO node5,node1->node2^node4<->node3,node6
    """.splitlines()
for test in tests:
    test = test.strip()
    if not test:
        continue
    print (test)
    try:
        result = nodeCommand.parseString(test, parseAll=True)
        print (result.dump())
    except ParseException as pe:
        print ("Failed:", test)
        print (pe)

The dump() method prints out the parsed tokens as a nested list, then lists each results name and its attached value:

GO node1,node2
['GO', 'node1', 'node2']
- nodes: ['node1', 'node2']
- verb: GO
TURN node1->node2->node3
['TURN', ['node1', '->', 'node2', '->', 'node3']]
- nodes: [['node1', '->', 'node2', '->', 'node3']]
- verb: TURN
GO node1,node2^node3,node4
['GO', 'node1', ['node2', '^', 'node3'], 'node4']
- nodes: ['node1', ['node2', '^', 'node3'], 'node4']
- verb: GO
FOLLOW node1->node2<->node3
['FOLLOW', ['node1', '->', 'node2', '<->', 'node3']]
- nodes: [['node1', '->', 'node2', '<->', 'node3']]
- verb: FOLLOW
GO node5,node1->node2^node4<->node3,node6
['GO', 'node5', ['node1', '->', ['node2', '^', 'node4'], '<->', 'node3'], 'node6']
- nodes: ['node5', ['node1', '->', ['node2', '^', 'node4'], '<->', 'node3'], 'node6']
- verb: GO

At this point, you can just parse your commands, and then based on the verb, dispatch to whichever appropriate method performs that verb.

But let me suggest a structure that I've found to be helpful to capture this logic using Python objects. Define a simple class hierarchy of commands that implement the various verb functions in an abstract method doCommand:

# base class
class Command(object):
    def __init__(self, tokens):
        self.cmd = tokens.verb
        self.nodeExprs = tokens.nodes

    def doCommand(self):
        """
        Execute command logic, using self.cmd and self.nodeExprs.
        To be overridden in sub classes.
        """
        print (self.cmd, '::', self.nodeExprs.asList())

# these should implement doCommand, but not needed for this example
class GoCommand(Command): pass
class TurnCommand(Command): pass
class FollowCommand(Command): pass

This method will convert your parsed results to an instance of the appropriate command class:

verbClassMap = {
    'GO' : GoCommand,
    'TURN' : TurnCommand,
    'FOLLOW' : FollowCommand,
    }
def tokensToCommand(tokens):
    cls = verbClassMap[tokens.verb]
    return cls(tokens)

But you can also build this into your parser as a parse-time callback, so that once the parsing is done, you get not just a list of strings and sublists, but an object ready to be "executed" by invoking its doCommand method. To do this, just attach tokensToCommand as a parse action on the overall nodeCommand expression:

nodeCommand.setParseAction(tokensToCommand)

Now we slightly modify our test code:

for test in tests:
    test = test.strip()
    if not test:
        continue
    try:
        result = nodeCommand.parseString(test, parseAll=True)
        result[0].doCommand()
    except ParseException as pe:
        print ("Failed:", test)
        print (pe)

Since we didn't actually implement doCommand on the subclasses, all we get is the default base class behavior, which is to just echo back the parsed verb and the node list:

GO :: ['node1', 'node2']
TURN :: [['node1', '->', 'node2', '->', 'node3']]
GO :: ['node1', ['node2', '^', 'node3'], 'node4']
FOLLOW :: [['node1', '->', 'node2', '<->', 'node3']]
GO :: ['node5', ['node1', '->', ['node2', '^', 'node4'], '<->', 'node3'], 'node6']

(This code was run with Python 3, pyparsing 2.0.0. It will also run with Python 2, pyparsing 1.5.7.)

EDIT

To get your chained expressions a op b op c into [a,op,b], [b, op, c], use a parse action to restructure the [a,op,b,op,c] results as parsed into the pairwise expressions. The infixNotation method allows you to define a parse action to attach to level in the operator hierarchy.

A method to restructure the chained expression results looks like:

def expandChainedExpr(tokens):
    ret = ParseResults([])
    tokeniter = iter(tokens[0])
    lastexpr = next(tokeniter)
    for op,nextexpr in zip(tokeniter,tokeniter):
        ret += ParseResults([[lastexpr, op, nextexpr]])
        lastexpr = nextexpr
    return ret

This builds an entirely new ParseResults to replace the original chained results. Notice how each lastexpr op nextexpr gets saved as its own subgroup, then nextexpr gets copied to lastexpr, and then loops around to get the next op-nextexpr pair.

To attach this reformatter into the parser, add it as a fourth element of that hierarchy's level in the infixNotation:

nodeExpr = infixNotation(nodeRef,
    [
    ('^', 2, opAssoc.LEFT),
    (binop, 2, opAssoc.LEFT, expandChainedExpr),
    ])

Now the output of:

FOLLOW node1->node2<->node3

is expanded to:

('FOLLOW', '::', [['node1', '->', 'node2'], ['node2', '<->', 'node3']])
like image 140
PaulMcG Avatar answered Jan 30 '26 00:01

PaulMcG



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!