Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scala parser combinator, large file issue

I have written a parser as follows:

class LogParser extends JavaTokenParsers {

  def invertedIndex: Parser[Array[Array[(Int, Int)]]] = {
    num ~> num ~> num ~> rep(postingsList) ^^ {
      _.toArray
    }
  }

  def postingsList: Parser[Array[(Int, Int)]] = {
    num ~> rep(entry) ^^ {
      _.toArray
    }
  }

  def entry = {
    num ~ "," ~ num ^^ {
      case docID ~ "," ~ count => (docID.toInt, count.toInt)
    }
  }

  def num = wholeNumber ^^ (_.toInt)

}

If I parse from a (270MB) file with a FileReader as follows:

val index = parseAll(invertedIndex, new FileReader("path/to/file")).get

I get an Exception in thread "main" java.lang.StackOverflowError (I have also tried wrapping in a BufferedReader) but I can fix it by first reading the file into a String like so:

val input = io.Source.fromFile("path/to/file")
val str = input.mkString
input.close()
val index = parseAll(invertedIndex, str).get

Why is this the case? Is there any way to avoid reading it as a String first, it seems a waste?

like image 838
Robert Avatar asked Nov 03 '12 21:11

Robert


1 Answers

There is another library[1] that is a lot like the scala parser combinators that supports Trampolining which is what you need to stop the stackoverflow errors.

[1] https://github.com/djspiewak/gll-combinators

like image 51
Ivan Meredith Avatar answered Sep 19 '22 18:09

Ivan Meredith