Edit: I did a first version, which Eike helped me to advance quite a bit on it. I'm now stuck to a more specific problem, which I will describe bellow. You can have a look at the original question in the history
I'm using pyparsing to parse a small language used to request specific data from a database. It features numerous keyword, operators and datatypes as well as boolean logic.
I'm trying to improve the error message sent to the user when he does a syntax error, since the current one is not very useful. I designed a small example, similar to what I'm doing with the language aforementioned but much smaller:
#!/usr/bin/env python
from pyparsing import *
def validate_number(s, loc, tokens):
if int(tokens[0]) != 0:
raise ParseFatalException(s, loc, "number musth be 0")
def fail(s, loc, tokens):
raise ParseFatalException(s, loc, "Unknown token %s" % tokens[0])
def fail_value(s, loc, expr, err):
raise ParseFatalException(s, loc, "Wrong value")
number = Word(nums).setParseAction(validate_number).setFailAction(fail_value)
operator = Literal("=")
error = Word(alphas).setParseAction(fail)
rules = MatchFirst([
Literal('x') + operator + number,
])
rules = operatorPrecedence(rules | error , [
(Literal("and"), 2, opAssoc.RIGHT),
])
def try_parse(expression):
try:
rules.parseString(expression, parseAll=True)
except Exception as e:
msg = str(e)
print("%s: %s" % (msg, expression))
print(" " * (len("%s: " % msg) + (e.loc)) + "^^^")
So basically, the only things which we can do with this language, is writing series of x = 0
, joined together with and
and parenthesis.
Now, there are cases, when and
and parenthesis are used, where the error reporting is not very good. Consider the following examples:
>>> try_parse("x = a and x = 0") # This one is actually good!
Wrong value (at char 4), (line:1, col:5): x = a and x = 0
^^^
>>> try_parse("x = 0 and x = a")
Expected end of text (at char 6), (line:1, col:1): x = 0 and x = a
^^^
>>> try_parse("x = 0 and (x = 0 and (x = 0 and (x = a)))")
Expected end of text (at char 6), (line:1, col:1): x = 0 and (x = 0 and (x = 0 and (x = a)))
^^^
>>> try_parse("x = 0 and (x = 0 and (x = 0 and (x = 0)))")
Expected end of text (at char 6), (line:1, col:1): x = 0 and (x = 0 and (x = 0 and (xxxxxxxx = 0)))
^^^
Actually, it seems that if the parser can't parse (and parse here is important) something after a and
, it doesn't produce good error messages anymore :(
And I mean parse, since if it can parse 5 but the "validation" fails in the parse action, it still produces a good error message. But, if it can't parse a valid number (like a
) or a valid keyword (like xxxxxx
), it stops producing the right error messages.
Any idea?
Pyparsing will always have somewhat bad error messages, because it backtracks. The error message is generated in the last rule that the parser tries. The parser can't know where the error really is, it only knows that there is no matching rule.
For good error messages you need a parser that gives up early. These parsers are less flexible than Pyparsing, but most conventional programming languages can be parsed with such parsers. (C++ and Scala IMHO can't.)
To improve error messages in Pyparsing use the -
operator, it works like the +
operator, but it does not backtrack. You would use it like this:
assignment = Literal("let") - varname - "=" - expression
Here is a small article on improving error reporting, by Pyparsing's author.
Edit
You could also generate good error messages for the invalid numbers in the parse actions that do the validation. If the number is invalid you raise an exception that is not caught by Pyparsing. This exception can contain a good error message.
Parse actions can have three arguments [1]:
ParseResults
objectThere are also three useful helper methods for creating good error messages [2]:
lineno(loc, string)
- function to give the line number of the location within the string; the first line is line 1, newlines start new rows.col(loc, string)
- function to give the column number of the location within the string; the first column is column 1, newlines reset the column number to 1.line(loc, string)
- function to retrieve the line of text representing lineno(loc, string)
. Useful when printing out diagnostic messages for exceptions.Your validating parse action would then be like this:
def validate_odd_number(s, loc, toks):
value = toks[0]
value = int(value)
if value % 2 == 0:
raise MyFatalParseException(
"not an odd number. Line {l}, column {c}.".format(l=lineno(loc, s),
c=col(loc, s)))
[1] http://pythonhosted.org/pyparsing/pyparsing.pyparsing.ParserElement-class.html#setParseAction
[2] HowToUsePyparsing
Edit
Here [3] is an improved version of the question's current (2013-4-10) script. It gets the example errors right, but other error are indicated at the wrong position. I believe there are bugs in my version of Pyparsing ('1.5.7'), but maybe I just don't understand how Pyparsing works. The issues are:
-
operator seems not to work.[3] http://pastebin.com/7E4kSnkm
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With