Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parser Combinators: Does repsep allows back-tracking?

Consider example of the parser like this:

object TestParser extends RegexParsers {
    override protected val whiteSpace = """[ \t]*""".r  

    def eol = """(\r?\n)+""".r
    def item = "[a-zA-Z][a-zA-Z0-9-]*".r
    def list = "items:" ~> rep1sep(item,",") 
    def constraints = "exclude:" ~> item

    def itemsDefinition = (rep1sep(list, eol) ~ repsep(constraints,eol))
}

If I try to parse this input (without two lines contains exclude works OK):

items: item1, item2, item3, item3, item4
items: item2, item3, item3, item5, item4    
items: item4, item5, item6, item10      
items: item1, item2, item3
exclude: item1
exclude: item2

I get following error:

[5.5] failure: `items:' expected but `e' found

       exclude: item1

       ^

The problem is obvious this line:

def itemsDefinition = (rep1sep(list, eol) ~ repsep(constraints,eol))

What is reason it doesn't work. Does it have something to do with backtracking? What alternatives do I have to make it work?

like image 970
PrimosK Avatar asked Feb 22 '12 15:02

PrimosK


1 Answers

You would need an eol between your lists and your constraints

(rep1sep(list, eol) <~ eol) ~ repsep(constraint,eol)

Completing the answer:

Your grammar specify eol as a separator between lists, not a terminator. It would accept an input where the first exclude appears just after the last item3 (with a whitespace, but not a new line).

After your parser reach the unwanted eol, it looks for items, and find excludes instead. Which gives the displayed error message. Then, the parser DOES indeed backtrack, to the previous new line. It considers the possibility that the lists part stops there, and look for excludes. But if finds an eol instead. So another possible error message would be "excludes expected, eol found", which in this case would have been more helpful

When there is a choice in the grammar, and no branch succeeds, the parser returns the error with the farthermost position, which is normally the right strategy. Suppose you grammar allows an "if" or a "for", and the input is "if !!!". On the if branch, the error would be something like "(" expected, "!" found. On the for branch, the message would be "for expected, if found". Clearly, the message from the if branch, which appears on the second token, is way better than the message from the for branch, on the first token and not relevant at all.

On the question of separator/terminator, you may consider :

  • separator (; in Pascal) : repsep(item, separator)
  • terminator (; in C) : rep(item <~ terminator)
  • flexible : repsep(item, separator) <~ separator?

the last one would allow for a single separator after no items at all. If this is undesirable, maybe (rep1sep(item, separator) <~ separator?)?.

like image 132
Didier Dupont Avatar answered Sep 19 '22 09:09

Didier Dupont