Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is this a correct usage of transients?

In the talk "Bootstrapping Clojure at Groupon" by Tyler Jennings, from 25:14 to 28:24, he discusses two implementations of a separate function, both using transients:

(defn separate-fast-recur [pred coll]
  (loop [true-elements (transient [])
         false-elements (transient [])
         my-coll coll]
    (if (not (empty? my-coll))
      (let [curr (first my-coll)
            tail (rest my-coll)]
        (if (pred curr)
          (recur (conj! true-elements curr) false-elements tail)
          (recur true-elements (conj! false-elements curr) tail)))
      [(persistent! true-elements) (persistent! false-elements)])))

(defn separate-fast-doseq [pred coll]
  (let [true-elements (transient [])
        false-elements (transient [])]
    (doseq [curr coll]
      (if (pred curr)
        (conj! true-elements curr)
        (conj! false-elements curr)))
      [(persistent! true-elements) (persistent! false-elements)]))

(These are both copied verbatim, including the unidiomatic indentation in the last line of the second one.)

He notes that, in the benchmark he used, the first function above took 1.1 seconds while the second function above took 0.8 seconds, noting that the second is therefore superior to the first. However, according to the Clojure documentation on transients:

Note in particular that transients are not designed to be bashed in-place. You must capture and use the return value in the next call.

Thus it seems to me that this separate-fast-doseq function is incorrect. But I am having difficulty believing that it is incorrect, considering the nature of the rest of the talk.

Is this separate-fast-doseq function a correct usage of transients? Why or why not? (And if not, what is an example of it breaking?)

like image 723
Sam Estep Avatar asked Mar 09 '23 14:03

Sam Estep


1 Answers

The second implementation is incorrect for exactly the reason you suspect. A transient collection is permitted to mutate itself for efficiency, but it is never required to, so any one of these conj! calls may return an object with a different identity. If that happens, then by discarding the result of conj!, the function you have pasted will behave incorrectly.

However, I cannot provide an example of it breaking. In the current implementation of Clojure, as it happens, conj! does always mutate itself in place. Note the unconditional return this at the end. Therefore, this function will behave as expected. However, it relies for its correctness on implementation details, which may change at any time.

For an example of a similar sort of operation which does break, try using a map instead of a vector:

(let [m (transient {})]
  (doseq [i (range 20)]
    (assoc! m i i))
  (count (persistent! m)))

8
like image 64
amalloy Avatar answered Mar 18 '23 23:03

amalloy