Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Improving error messages with pyparsing

Edit: I did a first version, which Eike helped me to advance quite a bit on it. I'm now stuck to a more specific problem, which I will describe bellow. You can have a look at the original question in the history


I'm using pyparsing to parse a small language used to request specific data from a database. It features numerous keyword, operators and datatypes as well as boolean logic.

I'm trying to improve the error message sent to the user when he does a syntax error, since the current one is not very useful. I designed a small example, similar to what I'm doing with the language aforementioned but much smaller:

#!/usr/bin/env python                            

from pyparsing import *

def validate_number(s, loc, tokens):
    if int(tokens[0]) != 0:
        raise ParseFatalException(s, loc, "number musth be 0")

def fail(s, loc, tokens):
    raise ParseFatalException(s, loc, "Unknown token %s" % tokens[0])

def fail_value(s, loc, expr, err):
    raise ParseFatalException(s, loc, "Wrong value")

number =  Word(nums).setParseAction(validate_number).setFailAction(fail_value)
operator = Literal("=")

error = Word(alphas).setParseAction(fail)
rules = MatchFirst([
    Literal('x') + operator + number,
])

rules = operatorPrecedence(rules | error , [
    (Literal("and"), 2, opAssoc.RIGHT),
])

def try_parse(expression):
    try:
        rules.parseString(expression, parseAll=True)
    except Exception as e:
        msg = str(e)
        print("%s: %s" % (msg, expression))
        print(" " * (len("%s: " % msg) + (e.loc)) + "^^^")

So basically, the only things which we can do with this language, is writing series of x = 0, joined together with and and parenthesis.

Now, there are cases, when and and parenthesis are used, where the error reporting is not very good. Consider the following examples:

>>> try_parse("x = a and x = 0") # This one is actually good!
Wrong value (at char 4), (line:1, col:5): x = a and x = 0
                                              ^^^
>>> try_parse("x = 0 and x = a")
Expected end of text (at char 6), (line:1, col:1): x = 0 and x = a
                                                         ^^^
>>> try_parse("x = 0 and (x = 0 and (x = 0 and (x = a)))")
Expected end of text (at char 6), (line:1, col:1): x = 0 and (x = 0 and (x = 0 and (x = a)))
                                                         ^^^
>>> try_parse("x = 0 and (x = 0 and (x = 0 and (x = 0)))")
Expected end of text (at char 6), (line:1, col:1): x = 0 and (x = 0 and (x = 0 and (xxxxxxxx = 0)))
                                                         ^^^

Actually, it seems that if the parser can't parse (and parse here is important) something after a and, it doesn't produce good error messages anymore :(

And I mean parse, since if it can parse 5 but the "validation" fails in the parse action, it still produces a good error message. But, if it can't parse a valid number (like a) or a valid keyword (like xxxxxx), it stops producing the right error messages.

Any idea?

like image 405
Jonathan Ballet Avatar asked Apr 09 '13 08:04

Jonathan Ballet


1 Answers

Pyparsing will always have somewhat bad error messages, because it backtracks. The error message is generated in the last rule that the parser tries. The parser can't know where the error really is, it only knows that there is no matching rule.

For good error messages you need a parser that gives up early. These parsers are less flexible than Pyparsing, but most conventional programming languages can be parsed with such parsers. (C++ and Scala IMHO can't.)

To improve error messages in Pyparsing use the - operator, it works like the + operator, but it does not backtrack. You would use it like this:

assignment = Literal("let") - varname - "=" - expression

Here is a small article on improving error reporting, by Pyparsing's author.

Edit

You could also generate good error messages for the invalid numbers in the parse actions that do the validation. If the number is invalid you raise an exception that is not caught by Pyparsing. This exception can contain a good error message.

Parse actions can have three arguments [1]:

  • s = the original string being parsed (see note below)
  • loc = the location of the matching substring
  • toks = a list of the matched tokens, packaged as a ParseResults object

There are also three useful helper methods for creating good error messages [2]:

  • lineno(loc, string) - function to give the line number of the location within the string; the first line is line 1, newlines start new rows.
  • col(loc, string) - function to give the column number of the location within the string; the first column is column 1, newlines reset the column number to 1.
  • line(loc, string) - function to retrieve the line of text representing lineno(loc, string). Useful when printing out diagnostic messages for exceptions.

Your validating parse action would then be like this:

def validate_odd_number(s, loc, toks):
    value = toks[0]
    value = int(value)
    if value % 2 == 0:
        raise MyFatalParseException(
            "not an odd number. Line {l}, column {c}.".format(l=lineno(loc, s),
                                                              c=col(loc, s)))

[1] http://pythonhosted.org/pyparsing/pyparsing.pyparsing.ParserElement-class.html#setParseAction

[2] HowToUsePyparsing

Edit

Here [3] is an improved version of the question's current (2013-4-10) script. It gets the example errors right, but other error are indicated at the wrong position. I believe there are bugs in my version of Pyparsing ('1.5.7'), but maybe I just don't understand how Pyparsing works. The issues are:

  • ParseFatalException seems not to be always fatal. The script works as expected when I use my own exception.
  • The - operator seems not to work.

[3] http://pastebin.com/7E4kSnkm

like image 179
Eike Avatar answered Nov 07 '22 09:11

Eike