Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to parse placeholders from text without throwing away your sword so you can fight off the marauders with a lampshade

I needed to parse placeholders out of text like abc $$FOO$$ cba. I hacked to together something with Scala's parser combinators, but I'm not really happy with the solution.

In particular, I resorted to a zero-width matcher in the regular expression (?=(\$\$|\z)) to stop parsing the text and start parsing the placeholders. This sounds perilously close to the shenanigans discussed and colorfully dismissed on the scala mailing list (which inspired the title of this question.)

So, the challenge: fix my parser to work without this hack. I'd like to see a clear progression from the problem to your solution, so I can replace my strategy of randomly assembling combinators until tests pass.

import scala.util.parsing.combinator.RegexParsers

object PlaceholderParser extends RegexParsers {
  sealed abstract class Element
  case class Text(text: String) extends Element
  case class Placeholder(key: String) extends Element

  override def skipWhitespace = false

  def parseElements(text: String): List[Element] = parseAll(elements, text) match {
    case Success(es, _) => es
    case NoSuccess(msg, _) => error("Could not parse: [%s]. Error: %s".format(text, msg))
  }

  def parseElementsOpt(text: String): ParseResult[List[Element]] = parseAll(elements, text)

  lazy val elements: Parser[List[Element]] = rep(element)
  lazy val element: Parser[Element] = placeholder ||| text
  lazy val text: Parser[Text] = """(?ims).+?(?=(\$\$|\z))""".r ^^ Text.apply
  lazy val placeholder: Parser[Placeholder] = delimiter ~> """[\w. ]+""".r <~ delimiter ^^ Placeholder.apply
  lazy val delimiter: Parser[String] = literal("$$")
}


import org.junit.{Assert, Test}

class PlaceholderParserTest {
  @Test
  def parse1 = check("a quick brown $$FOX$$ jumped over the lazy $$DOG$$")(Text("a quick brown "), Placeholder("FOX"), Text(" jumped over the lazy "), Placeholder("DOG"))

  @Test
  def parse2 = check("a quick brown $$FOX$$!")(Text("a quick brown "), Placeholder("FOX"), Text("!"))

  @Test
  def parse3 = check("a quick brown $$FOX$$!\n!")(Text("a quick brown "), Placeholder("FOX"), Text("!\n!"))

  @Test
  def parse4 = check("a quick brown $$F.O X$$")(Text("a quick brown "), Placeholder("F.O X"))

  def check(text: String)(expected: Element*) = Assert.assertEquals(expected.toList, parseElements(text))
}
like image 412
retronym Avatar asked Nov 05 '22 10:11

retronym


1 Answers

I found another approach. There's no regex hack anymore, but the code is a little bit longer. It parses the whole string to a list of single characters or Placeholder objects. The compact function then compacts the list (i.e. it converts consecutive strings to Text objects and does not touch the Placeholder objects):

object PlaceholderParser extends RegexParsers {
  sealed abstract class Element
  case class Text(text: String) extends Element
  case class Placeholder(key: String) extends Element

  override def skipWhitespace = false

  def parseElements(text: String): List[Element] = parseAll(elements, text) match {
    case Success(es, _) => es
    case NoSuccess(msg, _) => error("Could not parse: [%s]. Error: %s".format(text, msg))
  }

  def parseElementsOpt(text: String): ParseResult[List[Element]] = parseAll(elements, text)

  def compact(l: List[Any]): List[Element] = {
    val builder = new StringBuilder()
    val r = l.foldLeft(List.empty[Element])((l, e) => e match {
      case s: String =>
        builder.append(s)
        l
      case p: Placeholder =>
        val t = if (builder.size > 0) {
          val k = l ++ List(Text(builder.toString))
          builder.clear
          k
        } else {
          l
        }
        t ++ List(p)
    })
    if (builder.size > 0) r ++ List(Text(builder.toString)) else r
  }

  lazy val elements: Parser[List[Element]] = (placeholder ||| text).+ ^^ compact
  lazy val text: Parser[String] = """(?ims).""".r
  lazy val placeholder: Parser[Placeholder] = delimiter ~> """[\w. ]+""".r <~ delimiter ^^ Placeholder.apply
  lazy val delimiter: Parser[String] = literal("$$")
}

It's not a perfect solution, but maybe something you can start with.

like image 79
Michel Krämer Avatar answered Dec 09 '22 10:12

Michel Krämer