Why is pyparsing removing a named result?

Question

Given the following minimal working example:

from pyparsing import *
latex_h  = QuotedString("$")('latex')
reg_text = Word(alphas)('text')
grammar  = OneOrMore( latex_h | reg_text )('line')

sol = grammar.parseString('''dog $x^2$ cat''')
print sol.dump()

I expected the output to look like:

['dog', 'x^2', 'cat']
- line: ['dog', 'x^2', 'cat']
  - text: dog
  - latex: x^2
  - text: cat

but instead I got:

['dog', 'x^2', 'cat']
- latex: x^2
- line: ['dog', 'x^2', 'cat']
  - latex: x^2
  - text: cat
- text: cat

I don't understand why in the parse tree did the dog get left behind? In addition, why are two elements of text, latex outside the line?

Laurence Dougal Myers · Accepted Answer

As Russell Borogove says, named results must be unique when on the same parsing level. You can't have a "line" with two or more named elements of the same type (e.g. two "text" or two "latex"), since they'll both use the same key in the underlying dictionary. I'll defer to Paul McGuire regarding the listAllMatches solution in the latest PyParsing, seeing as he wrote it and all :)

You can also work around this by attaching parsing actions to "latex_h" or "reg_text", however this won't help if the "latex_h" element requires knowledge of any sibling "reg_text" elements. In which case you will probably need to break up your grammar a bit further, or use a tree-based approach to parsing (working from the lowest element up to the root, by parse actions and/or by iterating through the list of results), rather than a dictionary-based approach.

It's important to note that the the parse tree did not leave "dog" behind. It was parsed correctly, it's just that the parsed result did not get assigned to a dictionary. You can access the parsed value like so: sol.line[0]

As for why 'latex' and 'cat' appear outside of the 'line', you need to put the OneOrMore definition within a Group().

Here's an example, which applies a parse action to the reg_text element at the time it's parsed (rather than when any parent element, like grammar is parsed). It does not solve the 'named result' issue you are having, but without context on what you are trying to achieve with your parser I can't suggest a solution.

from pyparsing import *
latex_h  = QuotedString("$")('latex')
reg_text = Word(alphas)('text')
grammar  = Group(OneOrMore( latex_h | reg_text ))('line')

def parse_reg_text(s, loc, toks):
    if toks.text == 'dog':
        return "atomic " + toks.text
    else:
        return "ninja " + toks.text

reg_text.setParseAction(parse_reg_text)

sol = grammar.parseString('''dog $x^2$ cat $y^3$''')
print sol.dump()

This gives the following output:

[['atomic dog', 'x^2', 'ninja cat', 'y^3']]
- line: ['atomic dog', 'x^2', 'ninja cat', 'y^3']
  - latex: y^3
  - text: ninja cat

Why is pyparsing removing a named result?

Tags:

python

pyparsing

Hooked

1 Answers

Laurence Dougal Myers

Recent Activity

Donate For Us

Why is pyparsing removing a named result?

Tags:

python

pyparsing

Hooked

1 Answers

Laurence Dougal Myers

Related questions

Recent Activity

Donate For Us