(Question credit: Fernando Abrao.)
I hear about the performance benefits of transducers in Clojure, but I'm not sure how to use them.
Say I have a qos/device-qos-range
function that returns sequence of maps, some of which contain a decimal :samplevalue
, like so:
[
{ :samplevalue 1.3, ... },
{ :othervalue -27.7, ... },
{ :samplevalue 7.5, ... },
{ :samplevalue 1.9, ... },
]
I'd like to see how many :samplevalue
s fall into each integer bin, like so:
(frequencies
(reduce #(if (not (nil? (:samplevalue %2)))
(conj %1 (.intValue (:samplevalue %2))))
[]
(qos/device-qos-range origem device qos alvo inicio fim)))
;; => {1 2, 7 1}
How can I turn this into a fast version with transducers that eliminates intermediate data structures (such as the one returned by reduce
)? Bonus points for code that can take advantage of multiple cores to do parallel processing.
(Answer credit: Renzo Borgatti (@reborg).)
First, let's set up some sample data, which we'll use for performance tests later. This vector contains 500k maps with the same key. Values are overlapping 1/5th of the time.
(def data
(mapv hash-map
(repeat :samplevalue)
(concat (range 1e5)
(range 1e5)
(range 1e5)
(range 1e5)
(range 1e5))))
Now let's do your transformation with transducers. Note that this solution is not parallel. I shortened your .intValue
to just int
, which does the same thing. Also, conditionally fetching :samplevalue
from each map can be shortened to just (keep :samplevalue sequence)
, which is equivalent to (remove nil? (map :samplevalue sequence))
. We'll use Criterium to benchmark.
(require '[criterium.core :refer [quick-bench]])
(quick-bench
(transduce
(comp
(keep :samplevalue)
(map int))
(completing #(assoc! %1 %2 (inc (get %1 %2 0))) persistent!)
(transient {})
data))
;; My execution time mean: 405 ms
Note that we're not calling frequencies
as an external step this time. Instead, we've woven it into the operation. And just like what frequencies
does, we've done the operations on a transient hashmap for extra performance. We do this by using a transient hashmap as the seed and completing
the final value by calling persistent!
on it.
We can make this parallel. For maximum performance, we use a mutable Java ConcurrentHashMap
instead of an immutable Clojure data structure.
(require '[clojure.core.reducers :as r])
(import '[java.util HashMap Collections Map]
'java.util.concurrent.atomic.AtomicInteger
'java.util.concurrent.ConcurrentHashMap)
(quick-bench
(let [concurrency-level (.availableProcessors (Runtime/getRuntime))
m (ConcurrentHashMap. (quot (count data) 2) 0.75 concurrency-level)
combinef (fn ([] m) ([_ _])) ; just return `m` from the combine step
rf (fn [^Map m k]
(let [^AtomicInteger v (or (.get m k) (.putIfAbsent m k (AtomicInteger. 1)))]
(when v (.incrementAndGet v))
m))
reducef ((comp (keep :samplevalue) (map int)) rf)]
(r/fold combinef reducef data)
(into {} m)))
;; My execution time mean: 70 ms
Here we use fold
from the clojure.core.reducers
library to achieve parallelism. Note that in a parallel context any transducers one uses need to be stateless. Also note that a ConcurrentHashMap
doesn't support using nil
as a key or value; fortunately, we don't need to do that here.
The output is converted into an immutable Clojure hashmap at the end. You can remove that step and just use the ConcurrentHashMap instance for an additional speedup—on my machine, removing the into
step makes the whole fold
take about 26ms.
Edit 2017-11-20: User @clojuremostly correctly pointed out that an earlier version of this answer had the call to quick-bench
inside the let
block that initialized the concurrent hash map instance, which meant that the benchmark used the same instance for all of its runs. I moved the call to quick-bench
to be outside the let
block. It did not significantly affect the results.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With