Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scala Parser Combinators: Parsing in a stream

I'm using the native parser combinator library in scala, and I'd like to use it to parse a number of large files. I have my combinators set up, but the file that I'm trying to parse is too large to be read into memory all at once. I'd like to be able to stream from an input file through my parser and read it back to disk so that I don't need to store it all in memory at once.My current system looks something like this:

val f = Source.fromFile("myfile")
parser.parse(parser.document.+, f.reader).get.map{_.writeToFile}
f.close

This reads the whole file in as it parses, which I'd like to avoid.

like image 756
John Sullivan Avatar asked Mar 23 '23 07:03

John Sullivan


1 Answers

There is no easy or built-in way to accomplish this using scala's parser combinators, which provide a facility for implementing parsing expression grammars.

Operators such as ||| (longest match) are largely incompatible with a stream parsing model, as they require extensive backtracking capabilities. In order to accomplish what you are trying to do, you would need to re-formulate your grammar such that no backtracking is required, ever. This is generally much harder than it sounds.

As mentioned by others, your best bet would be to look into a preliminary phase where you chunk your input (e.g. by line) so that you can handle a portion of the stream at a time.

like image 71
J Cracknell Avatar answered Apr 01 '23 07:04

J Cracknell