Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scala Parser that sometimes skips whitespace and sometimes does not

Tags:

parsing

scala

I've got a working Scala parser but the solution is not as clean as I would like. The problem is that some of the productions must consider whitespace as part of the token but the "higher-level" productions should be able to ignore/skip the whitespace.

If I use the typical scala parser pattern of extending the lower level parsers then the skipWhitespace settings are inherited and things get messy very quickly.

I think I would be better off not using the extends approach but rather have an instance of the low level parser available in the higher level parsers' class -- but I'm not sure how to make that work, such that each instance would see only one stream of input characters.

Here is part of the lowest-level parser -

class VulgarFractionParser extends RegexParsers  {
  override type Elem = Char

 override val whiteSpace = "".r

Then I extend that like

class NumberParser extends VulgarFractionParser with Positional {

But at this point the NumberParser must explicitly handle whitespace just like the FractionParser. For the NumberParser it is still pretty manageable - but at the next level up I really want to be able to just define productions that do use whitespace as a separator just like a normal regexParser would do.

An example would be something like:

IBM 33.33/ 1200.00
or
IBM 33.33/33.50 1200.00

The 2nd value sometimes has two parts separated by a "/" and sometimes only has a single part with nothing after the slash (or even not containing a slash at all).

   def bidOrAskPrice = ("$"?) ~> (bidOrAskPrice1 | bidOrAskPrice2 | bidOrAskPrice3) 

   def bidOrAskPrice1 = number ~ ("/".r) ~ number ~ (SPACES) ^^ { 
     case a ~ slash ~ b ~ sp1    => BidOrAsk(a,Some(b))
  }
  def bidOrAskPrice2 = (number ~ "/" ~ (SPACES)) ^^ { case a ~ slash ~ sp => BidOrAsk(a,None) }
   def bidOrAskPrice3 = (number ~ (SPACES?)) ^^ { case a ~ sp => BidOrAsk(a , None)}
like image 313
malsmith Avatar asked May 30 '12 04:05

malsmith


2 Answers

One solution is to override the handleWhiteSpace function and activate skipping whitespace with a var value in your extended class.

You can see the code of RegexParsers here : https://github.com/scala/scala/blob/v2.9.2/src/library/scala/util/parsing/combinator/RegexParsers.scala

like image 180
fp4me Avatar answered Sep 28 '22 08:09

fp4me


Doesn't it make more sense to turn the first parser into a token parser (a lexer, really), and make the second parser read that instead of plain Char?

like image 29
Daniel C. Sobral Avatar answered Sep 28 '22 07:09

Daniel C. Sobral