Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Grammars, Scala Parsing Combinators and Orderless Sets

I'm writing an application that will take in various "command" strings. I've been looking at the Scala combinator library to tokenize the commands. I find in a lot of cases I want to say: "These tokens are an orderless set, and so they can appear in any order, and some might not appear".

With my current knowledge of grammars I would have to define all combinations of sequences as such (pseudo grammar):

command = action~content
action = alphanum
content  = (tokenA~tokenB~tokenC | tokenB~tokenC~tokenA | tokenC~tokenB~tokenA ....... )

So my question is, considering tokenA-C are unique, is there a shorter way to define a set of any order using a grammar?

like image 874
BefittingTheorem Avatar asked Nov 23 '09 08:11

BefittingTheorem


3 Answers

You can use the "Parser.^?" operator to check a group of parse elements for duplicates.

  def tokens = tokenA | tokenB | tokenC
  def uniqueTokens = (tokens*) ^? (
    { case t if (t == t.removeDuplicates) => t },
    { "duplicate tokens found: " + _ })

Here is an example that allows you to enter any of the four stooges in any order, but fails to parse if a duplicate is encountered:

package blevins.example

import scala.util.parsing.combinator._  

case class Stooge(name: String)

object StoogesParser extends RegexParsers {
  def moe = "Moe".r
  def larry = "Larry".r
  def curly = "Curly".r
  def shemp = "Shemp".r
  def stooge = ( moe | larry | curly | shemp ) ^^ { case s => Stooge(s) }
  def certifiedStooge = stooge | """\w+""".r ^? (
    { case s: Stooge => s },
    { "not a stooge: " + _ })

  def stooges = (certifiedStooge*) ^? (
    { case x if (x == x.removeDuplicates) => x.toSet },
    { "duplicate stooge in: " + _ })

  def parse(s: String): String = {
    parseAll(stooges, new scala.util.parsing.input.CharSequenceReader(s)) match {
      case Success(r,_) => r.mkString(" ")
      case Failure(r,_) => "failure: " + r
      case Error(r,_) => "error: " + r
    }
  }

}

And some example usage:

package blevins.example

object App extends Application {

  def printParse(s: String): Unit = println(StoogesParser.parse(s))

  printParse("Moe Shemp Larry")
  printParse("Moe Shemp Shemp")
  printParse("Curly Beyonce")

  /* Output:
     Stooge(Moe) Stooge(Shemp) Stooge(Larry)
     failure: duplicate stooge in: List(Stooge(Moe), Stooge(Shemp), Stooge(Shemp))
     failure: not a stooge: Beyonce
  */
}
like image 198
Mitch Blevins Avatar answered Oct 20 '22 09:10

Mitch Blevins


There are ways around it. Take a look at the parser here, for example. It accepts 4 pre-defined numbers, which may appear in any other, but must appear once, and only once.

OTOH, you could write a combinator, if this pattern happens often:

def comb3[A](a: Parser[A], b: Parser[A], c: Parser[A]) =
  a ~ b ~ c | a ~ c ~ b | b ~ a ~ c | b ~ c ~ a | c ~ a ~ b | c ~ b ~ a
like image 31
Daniel C. Sobral Avatar answered Oct 20 '22 10:10

Daniel C. Sobral


I would not try to enforce this requirement syntactically. I'd write a production that admits multiple tokens from the set allowed and then use a non-parsing approach to ascertaining the acceptability of the keywords actually given. In addition to allowing a simpler grammar, it will allow you to more easily continue parsing after emitting a diagnostic about the erroneous usage.

Randall Schulz

like image 1
Randall Schulz Avatar answered Oct 20 '22 10:10

Randall Schulz