Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scala Parser Combinators - consume until match

I'm working with the native parser combinator library in Scala and I'd like to parse some parts of my input, but not others. Specifically, I'd like to discard all of the arbitrary text between inputs that I care about. For example, with this input:

begin

Text I care about
Text I care about

DONT CARE

Text I don't care about

begin

More text I care about
...

Right now I have:

object MyParser extends RegexParsers {
    val beginToken: Parser[String] = "begin"
    val dontCareToken: Parser[String] = "DONT CARE"
    val text: Parser[String] = not(dontCareToken) ~> """([^\n]+)""".r

    val document: Parser[String] = begin ~> text.+ <~ dontCareToken ^^ { _.mkString("\n") }
    val documents: Parser[Iterable[String]] = document.+

but I'm not sure how to ignore the text that comes after DONT CARE and until the next begin. Specifically, I don't want to make any assumptions about the form of that text, I just want to start parsing again at the next begin statement.

like image 538
John Sullivan Avatar asked Feb 26 '26 13:02

John Sullivan


1 Answers

You almost had it. Parse for what you don't care and then do nothing with it.

I added dontCareText and skipDontCare and then in your document parser indicated that skipDontCare was optional.

import scala.util.parsing.combinator.RegexParsers   

object MyParser extends RegexParsers {
    val beginToken: Parser[String] = "begin"
    val dontCareToken: Parser[String] = "DONT CARE"
    val text: Parser[String] = not(dontCareToken) ~> """([^\n]+)""".r
    val dontCareText: Parser[String] = not(beginToken) ~> """([^\n]+)""".r
    val skipDontCare = dontCareToken ~ dontCareText ^^ { case c => "" }

    val document: Parser[String] = 
      beginToken ~> text.+ <~ opt(skipDontCare) ^^ { 
        _.mkString("\n") 
      }
    val documents: Parser[Iterable[String]] = document.+
}


val s = """begin

Text I care about
Text I care about

DONT CARE

Text I don't care about

begin

More text I care about
"""

MyParser.parseAll(MyParser.documents,s)
like image 182
Keith Pinson Avatar answered Mar 01 '26 04:03

Keith Pinson



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!