Consider example of the parser like this:
object TestParser extends RegexParsers {
override protected val whiteSpace = """[ \t]*""".r
def eol = """(\r?\n)+""".r
def item = "[a-zA-Z][a-zA-Z0-9-]*".r
def list = "items:" ~> rep1sep(item,",")
def constraints = "exclude:" ~> item
def itemsDefinition = (rep1sep(list, eol) ~ repsep(constraints,eol))
}
If I try to parse this input (without two lines contains exclude works OK):
items: item1, item2, item3, item3, item4
items: item2, item3, item3, item5, item4
items: item4, item5, item6, item10
items: item1, item2, item3
exclude: item1
exclude: item2
I get following error:
[5.5] failure: `items:' expected but `e' found
exclude: item1
^
The problem is obvious this line:
def itemsDefinition = (rep1sep(list, eol) ~ repsep(constraints,eol))
What is reason it doesn't work. Does it have something to do with backtracking? What alternatives do I have to make it work?
You would need an eol between your lists and your constraints
(rep1sep(list, eol) <~ eol) ~ repsep(constraint,eol)
Completing the answer:
Your grammar specify eol as a separator between lists, not a terminator. It would accept an input where the first exclude
appears just after the last item3
(with a whitespace, but not a new line).
After your parser reach the unwanted eol
, it looks for items
, and find excludes
instead. Which gives the displayed error message. Then, the parser DOES indeed backtrack, to the previous new line. It considers the possibility that the lists part stops there, and look for excludes. But if finds an eol instead. So another possible error message would be "excludes expected, eol found"
, which in this case would have been more helpful
When there is a choice in the grammar, and no branch succeeds, the parser returns the error with the farthermost position, which is normally the right strategy. Suppose you grammar allows an "if"
or a "for"
, and the input is "if !!!"
. On the if
branch, the error would be something like "(" expected, "!" found
. On the for
branch, the message would be "for expected, if found"
. Clearly, the message from the if
branch, which appears on the second token, is way better than the message from the for
branch, on the first token and not relevant at all.
On the question of separator/terminator, you may consider :
;
in Pascal) : repsep(item, separator)
;
in C) : rep(item <~ terminator)
repsep(item, separator) <~ separator?
the last one would allow for a single separator after no items at all. If this is undesirable, maybe (rep1sep(item, separator) <~ separator?)?
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With