I needed to parse placeholders out of text like abc $$FOO$$ cba
. I hacked to together something with Scala's parser combinators, but I'm not really happy with the solution.
In particular, I resorted to a zero-width matcher in the regular expression (?=(\$\$|\z))
to stop parsing the text and start parsing the placeholders. This sounds perilously close to the shenanigans discussed and colorfully dismissed on the scala mailing list (which inspired the title of this question.)
So, the challenge: fix my parser to work without this hack. I'd like to see a clear progression from the problem to your solution, so I can replace my strategy of randomly assembling combinators until tests pass.
import scala.util.parsing.combinator.RegexParsers
object PlaceholderParser extends RegexParsers {
sealed abstract class Element
case class Text(text: String) extends Element
case class Placeholder(key: String) extends Element
override def skipWhitespace = false
def parseElements(text: String): List[Element] = parseAll(elements, text) match {
case Success(es, _) => es
case NoSuccess(msg, _) => error("Could not parse: [%s]. Error: %s".format(text, msg))
}
def parseElementsOpt(text: String): ParseResult[List[Element]] = parseAll(elements, text)
lazy val elements: Parser[List[Element]] = rep(element)
lazy val element: Parser[Element] = placeholder ||| text
lazy val text: Parser[Text] = """(?ims).+?(?=(\$\$|\z))""".r ^^ Text.apply
lazy val placeholder: Parser[Placeholder] = delimiter ~> """[\w. ]+""".r <~ delimiter ^^ Placeholder.apply
lazy val delimiter: Parser[String] = literal("$$")
}
import org.junit.{Assert, Test}
class PlaceholderParserTest {
@Test
def parse1 = check("a quick brown $$FOX$$ jumped over the lazy $$DOG$$")(Text("a quick brown "), Placeholder("FOX"), Text(" jumped over the lazy "), Placeholder("DOG"))
@Test
def parse2 = check("a quick brown $$FOX$$!")(Text("a quick brown "), Placeholder("FOX"), Text("!"))
@Test
def parse3 = check("a quick brown $$FOX$$!\n!")(Text("a quick brown "), Placeholder("FOX"), Text("!\n!"))
@Test
def parse4 = check("a quick brown $$F.O X$$")(Text("a quick brown "), Placeholder("F.O X"))
def check(text: String)(expected: Element*) = Assert.assertEquals(expected.toList, parseElements(text))
}
I found another approach. There's no regex hack anymore, but the code is a little bit longer. It parses the whole string to a list of single characters or Placeholder
objects. The compact
function then compacts the list (i.e. it converts consecutive strings to Text
objects and does not touch the Placeholder
objects):
object PlaceholderParser extends RegexParsers {
sealed abstract class Element
case class Text(text: String) extends Element
case class Placeholder(key: String) extends Element
override def skipWhitespace = false
def parseElements(text: String): List[Element] = parseAll(elements, text) match {
case Success(es, _) => es
case NoSuccess(msg, _) => error("Could not parse: [%s]. Error: %s".format(text, msg))
}
def parseElementsOpt(text: String): ParseResult[List[Element]] = parseAll(elements, text)
def compact(l: List[Any]): List[Element] = {
val builder = new StringBuilder()
val r = l.foldLeft(List.empty[Element])((l, e) => e match {
case s: String =>
builder.append(s)
l
case p: Placeholder =>
val t = if (builder.size > 0) {
val k = l ++ List(Text(builder.toString))
builder.clear
k
} else {
l
}
t ++ List(p)
})
if (builder.size > 0) r ++ List(Text(builder.toString)) else r
}
lazy val elements: Parser[List[Element]] = (placeholder ||| text).+ ^^ compact
lazy val text: Parser[String] = """(?ims).""".r
lazy val placeholder: Parser[Placeholder] = delimiter ~> """[\w. ]+""".r <~ delimiter ^^ Placeholder.apply
lazy val delimiter: Parser[String] = literal("$$")
}
It's not a perfect solution, but maybe something you can start with.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With