Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

connecting a custom tokenizer with a Parsers subclass

Tags:

parsing

scala

I struggle to understand how the scala parser combinator api is supposed to be used when you already have a list of tokens (which are not characters). I've taken a look at the source code for TokenParsers, but I'm failing to understand what the "lexical" member is and how I can plugin my own Reader implementation (or another way to get my tokens consumed by the parser).

The examples available online (and in the "Programming in Scala" book by Odersky et al.) stop short of showing how to use the api with non-character tokens. Some examples show that a subclass of Parsers must set the elem parameter to the type of token, but where are the tokens coming from? Where is the Reader[MyToken] input parameter?

Just to clarify: the lexical analysis is already done. Whitespace removal, delimiters, all of that stuff has been done. I have a list of tokens and just want to use the parser combinator niceness to create an AST. The tokens look somewhat like this:

sealed abstract class MyToken {
  val line : Int
  val col : Int
}
case class LPAREN ( line : Int, col : Int ) extends MyToken
case class RPAREN ( line : Int, col : Int ) extends MyToken

Etc.

like image 733
esl Avatar asked Oct 21 '22 16:10

esl


1 Answers

I figured it out eventually. The phrase() method takes a Reader parameter, so I can wrap my token stream and call that.

class MyParsers extends Parsers {
  type Elem = MyToken

  def parse(tokens: Iterable[MyToken]): ParseResult[Any] = {
    val reader = new MyReader(tokens)
    phrase(myGrammarRule)(reader)
  }

  // ...etc...
}

sealed class MyReader(tokens : Iterable[MyToken]) extends Reader[MyToken] {
  def pos : Position = tokens.head
  def atEnd : Boolean = tokens.isEmpty
  def rest : Reader[MyToken] = new MyReader(tokens.tail)
  def first : MyToken = tokens.head
}

sealed abstract class MyToken extends Position {
  val _line : Int
  val _col : Int

  override def column = _col
  override def line = _line
  override def lineContents = ""
}

case class LPAREN ( _line : Int, _col : Int ) extends MyToken
case class RPAREN ( _line : Int, _col : Int ) extends MyToken

The Position mixin is nice, because it allows the Parser to use the already existing position information in my tokens without any added glue.

like image 129
esl Avatar answered Oct 28 '22 23:10

esl