Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Simple String template replacement in Scala and Clojure

Below are functions written in Scala and Clojure to do simple replacement of templates in Strings. The input to each function is a String containing templates of the form {key} and a map from Symbol/Keyword to replacement value.

For example:

Scala:

replaceTemplates("This is a {test}", Map('test -> "game"))

Clojure:

(replace-templates "This is a {test}" {:test "game"})

will return "This is a game".

The input map uses Symbols/Keywords so that I don't have to deal with corner cases where the templates in the Strings contain braces.

Unfortunately, the algorithm is not very efficient.

Here is the Scala code:

def replaceTemplates(text: String,
                     templates: Map[Symbol, String]): String = {
  val builder = new StringBuilder(text)

  @tailrec
  def loop(key: String,
           keyLength: Int,
           value: String): StringBuilder = {
    val index = builder.lastIndexOf(key)
    if (index < 0) builder
    else {
      builder.replace(index, index + keyLength, value)
      loop(key, keyLength, value)
    }
  }

  templates.foreach {
    case (key, value) =>
      val template = "{" + key.name + "}"
      loop(template, template.length, value)
  }

  builder.toString
}

and here is the Clojure code:

(defn replace-templates
  "Return a String with each occurrence of a substring of the form {key}
   replaced with the corresponding value from a map parameter.
   @param str the String in which to do the replacements
   @param m a map of keyword->value"
  [text m]
  (let [sb (StringBuilder. text)]
    (letfn [(replace-all [key key-length value]
              (let [index (.lastIndexOf sb key)]
                (if (< index 0)
                  sb
                  (do
                    (.replace sb index (+ index key-length) value)
                    (recur key key-length value)))))]
      (doseq [[key value] m]
        (let [template (str "{" (name key) "}")]
          (replace-all template (count template) value))))
    (.toString sb)))

Here is a test case (Scala code):

replaceTemplates("""
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Pellentesque
elit nisi, egestas et tincidunt eget, {foo} mattis non erat. Aenean ut
elit in odio vehicula facilisis. Vestibulum quis elit vel nulla
interdum facilisis ut eu sapien. Nullam cursus fermentum
sollicitudin. Donec non congue augue. {bar} Vestibulum et magna quis
arcu ultricies consectetur auctor vitae urna. Fusce hendrerit
facilisis volutpat. Ut lectus augue, mattis {baz} venenatis {foo}
lobortis sed, varius eu massa. Ut sit amet nunc quis velit hendrerit
bibendum in eget nibh. Cras blandit nibh in odio suscipit eget aliquet
tortor placerat. In tempor ullamcorper mi. Quisque egestas, metus eu
venenatis pulvinar, sem urna blandit mi, in lobortis augue sem ut
dolor. Sed in {bar} neque sapien, vitae lacinia arcu. Phasellus mollis
blandit commodo.
""", Map('foo -> "HELLO", 'bar -> "GOODBYE", 'baz -> "FORTY-TWO"))

and the output:

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Pellentesque
elit nisi, egestas et tincidunt eget, HELLO mattis non erat. Aenean ut
elit in odio vehicula facilisis. Vestibulum quis elit vel nulla
interdum facilisis ut eu sapien. Nullam cursus fermentum
sollicitudin. Donec non congue augue. GOODBYE Vestibulum et magna quis
arcu ultricies consectetur auctor vitae urna. Fusce hendrerit
facilisis volutpat. Ut lectus augue, mattis FORTY-TWO venenatis HELLO
lobortis sed, varius eu massa. Ut sit amet nunc quis velit hendrerit
bibendum in eget nibh. Cras blandit nibh in odio suscipit eget aliquet
tortor placerat. In tempor ullamcorper mi. Quisque egestas, metus eu
venenatis pulvinar, sem urna blandit mi, in lobortis augue sem ut
dolor. Sed in GOODBYE neque sapien, vitae lacinia arcu. Phasellus mollis
blandit commodo.

The algorithm transverses the input map and for each pair, does a replacement in the input String, temporarily held in a StringBuilder. For each key/value pair, we search for the last occurrence of the key (enclosed in braces) and replace it with the value, until there are no more occurrences.

Does it make any performance difference if we use .lastIndexOf versus .indexOf in the StringBuilder?

How can the algorithm be improved? Is there a more idiomatic way to write the Scala and/or Clojure code?

UPDATE: See my follow-up.

UPDATE 2: Here is a better Scala implementation; O(n) in the length of the String. Note that I modified the Map to be [String, String] instead of [Symbol, String] on the recommendation of several people. (thanks mikera, kotarak):

/**
 * Replace templates of the form {key} in the input String with values from the Map.
 *
 * @param text the String in which to do the replacements
 * @param templates a Map from Symbol (key) to value
 * @returns the String with all occurrences of the templates replaced by their values
 */
def replaceTemplates(text: String,
                     templates: Map[String, String]): String = {
  val builder = new StringBuilder
  val textLength = text.length

  @tailrec
  def loop(text: String): String = {
    if (text.length == 0) builder.toString
    else if (text.startsWith("{")) {
      val brace = text.indexOf("}")
      if (brace < 0) builder.append(text).toString
      else {
        val replacement = templates.get(text.substring(1, brace)).orNull
          if (replacement != null) {
            builder.append(replacement)
            loop(text.substring(brace + 1))
          } else {
            builder.append("{")
            loop(text.substring(1))
          }
      }
    } else {
      val brace = text.indexOf("{")
      if (brace < 0) builder.append(text).toString
      else {
        builder.append(text.substring(0, brace))
        loop(text.substring(brace))
      }
    }
  }

  loop(text)
}

UPDATE 3: Here are a set of Clojure test cases (Scala versions are left as an exercise :-)):

(use 'clojure.test)

(deftest test-replace-templates
  (is (=        ; No templates
        (replace-templates "this is a test" {:foo "FOO"})
        "this is a test"))

  (is (=        ; One simple template
        (replace-templates "this is a {foo} test" {:foo "FOO"})
        "this is a FOO test"))

  (is (=        ; Two templates, second at end of input string
        (replace-templates "this is a {foo} test {bar}" {:foo "FOO" :bar "BAR"})
        "this is a FOO test BAR"))

  (is (=        ; Two templates
        (replace-templates "this is a {foo} test {bar} 42" {:foo "FOO" :bar "BAR"})
        "this is a FOO test BAR 42"))

  (is (=        ; Second brace-enclosed item is NOT a template
        (replace-templates "this is a {foo} test {baz} 42" {:foo "FOO" :bar "BAR"})
        "this is a FOO test {baz} 42"))

  (is (=        ; Second item is not a template (no closing brace)
        (replace-templates "this is a {foo} test {bar" {:foo "FOO" :bar "BAR"})
        "this is a FOO test {bar"))

  (is (=        ; First item is enclosed in a non-template brace-pair
        (replace-templates "this is {a {foo} test} {bar" {:foo "FOO" :bar "BAR"})
        "this is {a FOO test} {bar")))

(run-tests)
like image 233
Ralph Avatar asked May 24 '11 12:05

Ralph


4 Answers

I wrote a string interpolation library for Clojure that was brought into clojure-contrib as clojure.contrib.strint. I blogged about it; you'll find a description of the approach there. The most recent source for it can be viewed here on github. The big difference between clojure.contrib.strint and the approaches here is that the latter all perform the interpolation at runtime. In my experience, runtime interpolation is largely unnecessary, and using something like clojure.contrib.strint that performs the interpolation at compile-time often yields tangible performance benefits for your application.

Note that clojure.contrib.strint will hopefully be migrating to clojure.core.strint under Clojure's "new-contrib" organization.

like image 155
cemerick Avatar answered Nov 12 '22 01:11

cemerick


I think the best algorithm you can build is O(n) in the length of the input string and would go something like:

  1. Initialise an empty StringBuilder
  2. Scan the string to find the first "{", add any substring prior to this to your Stringbuilder. If no "{" found, you're finished!
  3. Scan until the next "}". Use whatever is in between the curly braces to do a map lookup in a String->String hashmap and add the result to your StringBuilder
  4. Go back to 2. and continue scanning from after the "}"

Converting to Scala/Clojure left as an exercise :-)

like image 37
mikera Avatar answered Nov 12 '22 01:11

mikera


Here's a version of the clojure implementation using regex to do the replacements. It's faster than your version (running your Lorum ipsum test case 100 times, see further down), and there's less code to maintain:

(defn replace-templates2 [text m]
  (clojure.string/replace text 
                          #"\{\w+\}" 
                          (fn [groups] 
                              ((keyword (subs groups 
                                              1 
                                              (dec (.length groups)))) m))))

The implementation is quick and dirty, but it works. The point is I think you should solve this using regular expressions.


Update:

Experimented a bit with a funky way to do the substringing, and got a surprising performance result. Here's the code:

(defn replace-templates3 [text m]
  (clojure.string/replace text 
                          #"\{\w+\}" 
                          (fn [groups] 
                              ((->> groups
                                    reverse
                                    (drop 1)
                                    reverse
                                    (drop 1)
                                    (apply str)
                                    keyword) m))))

And here are the results on my machine for your version, my first version, and finally this version (100 iterations):

"Elapsed time: 77.475072 msecs"
"Elapsed time: 50.238911 msecs"
"Elapsed time: 38.109875 msecs"
like image 7
Torbjørn Avatar answered Nov 12 '22 00:11

Torbjørn


Some people, when faced with a problem, think "I'll use regex!". Now they have two problems. Others, however, decide not to use regex -- and now they have three problems: implementing and maintaining an ad hoc implementation of half regex, plus the other two.

At any rate, consider this:

import scala.util.matching.Regex

def replaceTemplates(text: String,
                     templates: Map[String, String]): String = 
    """\{([^{}]*)\}""".r replaceSomeIn ( text,  { case Regex.Groups(name) => templates get name } )

It uses string builder to search and replace. The map is using String instead of Symbol because it is faster that way, and the code doesn't replace matches that do not have a valid mapping. Using replaceAllIn would avoid that, but would require some type annotation because that method is overloaded.

You might want to browse Scala's source code from the scaladoc API for Regex, and see what's going on.

like image 6
Daniel C. Sobral Avatar answered Nov 12 '22 01:11

Daniel C. Sobral