Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scala: Using StandardTokenParser for parsing hexadecimal numbers

Tags:

parsing

scala

I am using Scala combinatorial parser by extending scala.util.parsing.combinator.syntactical.StandardTokenParser. This class provides following methods

def ident : Parser[String] for parsing identifiers and

def numericLit : Parser[String] for parsing a number (decimal I suppose)

I am using scala.util.parsing.combinator.lexical.Scannersfrom scala.util.parsing.combinator.lexical.StdLexicalfor lexing.

My requirement is to parse a hexadecimal number (without the 0x prefix) which can be of any length. Basically a grammar like: ([0-9]|[a-f])+

I tried integrating Regex parser but there are type issues there. Other ways to extend the definition of lexer delimiter and grammar rules lead to token not found!

like image 518
thequark Avatar asked Aug 13 '10 16:08

thequark


2 Answers

As I thought the problem can be solved by extending the behavior of Lexer and not the Parser. The standard lexer takes only decimal digits, so I created a new lexer:

class MyLexer extends StdLexical {
  override type Elem = Char
  override def digit = ( super.digit | hexDigit )
  lazy val hexDigits = Set[Char]() ++ "0123456789abcdefABCDEF".toArray
  lazy val hexDigit = elem("hex digit", hexDigits.contains(_))
}

And my parser (which has to be a StandardTokenParser) can be extended as follows:

object ParseAST extends StandardTokenParsers{

  override val lexical:MyLexer = new MyLexer()
  lexical.delimiters += ( "(" , ")" , "," , "@")
  ...
 }

The construction of the "number" from digits is taken care by StdLexical class:

class StdLexical {
...

def token: Parser[Token] = 
    ...
| digit~rep(digit)^^{case first ~ rest => NumericLit(first :: rest mkString "")}
}

Since StdLexical gives just the parsed number as a String it is not a problem for me, as I am not interested in numeric value either.

like image 53
thequark Avatar answered Oct 23 '22 00:10

thequark


You can use the RegexParsers with an action associated to the token in question.

import scala.util.parsing.combinator._

object HexParser extends RegexParsers {
  val hexNum: Parser[Int] = """[0-9a-f]+""".r ^^ 
           { case s:String => Integer.parseInt(s,16) } 

  def seq: Parser[Any] = repsep(hexNum, ",")

}

This will define a parser that reads comma separated hex number with no prior 0x. And it will actually return a Int.

val result = HexParser.parse(HexParser.seq, "1, 2, f, 10, 1a2b34d")
scala> println(result)
[1.21] parsed: List(1, 2, 15, 16, 27439949)

Not there is no way to distinguish decimal notation numbers. Also I'm using the Integer.parseInt, this is limited to the size of your Int. To get any length you may have to make your own parser and use BigInteger or arrays.

like image 21
Thomas Avatar answered Oct 23 '22 01:10

Thomas