I'm trying to capture parts of a multi-lined string with a regex in Scala. The input is of the form:
val input = """some text
              |begin {
              |  content to extract
              |  content to extract
              |}
              |some text
              |begin {
              |  other content to extract
              |}
              |some text""".stripMargin
I've tried several possibilities that should get me the text out of the begin { } blocks. One of them:
val Block = """(?s).*begin \{(.*)\}""".r
input match {
  case Block(content) => println(content)
  case _ => println("NO MATCH")
}
I get a NO MATCH. If I drop the \} the regex looks like (?s).*begin \{(.*) and it matches the last block including the unwanted } and "some text". I checked my regex at rubular.com as with /.*begin \{(.*)\}/m and it matches at least one block. I thought when my Scala regex would match the same I could start using findAllIn to match all blocks. What am I doing wrong?
I had a look at Scala Regex enable Multiline option but I could not manage to capture all the occurrences of the text blocks in, for example, a Seq[String].
Any help is appreciated.
As Alex has said, when using pattern matching to extract fields from regular expressions, the pattern acts as if it was bounded (ie, using ^ and $). The usual way to avoid this problem is to use findAllIn first. This way:
val input = """some text
              |begin {
              |  content to extract
              |  content to extract
              |}
              |some text
              |begin {
              |  other content to extract
              |}
              |some text""".stripMargin
val Block = """(?s)begin \{(.*)\}""".r
Block findAllIn input foreach (_ match {
  case Block(content) => println(content)
  case _ => println("NO MATCH")
})
Otherwise, you can use .* at the beginning and end to get around that restriction:
val Block = """(?s).*begin \{(.*)\}.*""".r
input match {
  case Block(content) => println(content)
  case _ => println("NO MATCH")
}
By the way, you probably want a non-eager matcher:
val Block = """(?s)begin \{(.*?)\}""".r
Block findAllIn input foreach (_ match {
  case Block(content) => println(content)
  case _ => println("NO MATCH")
})
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With