Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In Clojure, how can I do a performant version of `frequencies` with transducers?

(Question credit: Fernando Abrao.)

I hear about the performance benefits of transducers in Clojure, but I'm not sure how to use them.

Say I have a qos/device-qos-range function that returns sequence of maps, some of which contain a decimal :samplevalue, like so:

[
  { :samplevalue 1.3, ... },
  { :othervalue -27.7, ... },
  { :samplevalue 7.5, ... },
  { :samplevalue 1.9, ... },
]

I'd like to see how many :samplevalues fall into each integer bin, like so:

(frequencies
  (reduce #(if (not (nil? (:samplevalue %2)))
             (conj %1 (.intValue (:samplevalue %2))))
          []
          (qos/device-qos-range origem device qos alvo inicio fim)))

;; => {1 2, 7 1}

How can I turn this into a fast version with transducers that eliminates intermediate data structures (such as the one returned by reduce)? Bonus points for code that can take advantage of multiple cores to do parallel processing.

like image 368
Jeff Terrell Ph.D. Avatar asked Nov 17 '17 20:11

Jeff Terrell Ph.D.


1 Answers

(Answer credit: Renzo Borgatti (@reborg).)

First, let's set up some sample data, which we'll use for performance tests later. This vector contains 500k maps with the same key. Values are overlapping 1/5th of the time.

(def data 
 (mapv hash-map 
       (repeat :samplevalue) 
       (concat (range 1e5)
               (range 1e5)
               (range 1e5)
               (range 1e5)
               (range 1e5))))

Now let's do your transformation with transducers. Note that this solution is not parallel. I shortened your .intValue to just int, which does the same thing. Also, conditionally fetching :samplevalue from each map can be shortened to just (keep :samplevalue sequence), which is equivalent to (remove nil? (map :samplevalue sequence)). We'll use Criterium to benchmark.

(require '[criterium.core :refer [quick-bench]])
(quick-bench
  (transduce
    (comp
      (keep :samplevalue)
      (map int))
    (completing #(assoc! %1 %2 (inc (get %1 %2 0))) persistent!)
    (transient {})
    data))
;; My execution time mean: 405 ms

Note that we're not calling frequencies as an external step this time. Instead, we've woven it into the operation. And just like what frequencies does, we've done the operations on a transient hashmap for extra performance. We do this by using a transient hashmap as the seed and completing the final value by calling persistent! on it.

We can make this parallel. For maximum performance, we use a mutable Java ConcurrentHashMap instead of an immutable Clojure data structure.

(require '[clojure.core.reducers :as r])
(import '[java.util HashMap Collections Map]
        'java.util.concurrent.atomic.AtomicInteger
        'java.util.concurrent.ConcurrentHashMap)

(quick-bench
  (let [concurrency-level (.availableProcessors (Runtime/getRuntime))
        m (ConcurrentHashMap. (quot (count data) 2) 0.75 concurrency-level)
        combinef (fn ([] m) ([_ _]))  ; just return `m` from the combine step
        rf (fn [^Map m k]
             (let [^AtomicInteger v (or (.get m k) (.putIfAbsent m k (AtomicInteger. 1)))]
               (when v (.incrementAndGet v))
               m))
        reducef ((comp (keep :samplevalue) (map int)) rf)]
    (r/fold combinef reducef data)
    (into {} m)))
;; My execution time mean: 70 ms

Here we use fold from the clojure.core.reducers library to achieve parallelism. Note that in a parallel context any transducers one uses need to be stateless. Also note that a ConcurrentHashMap doesn't support using nil as a key or value; fortunately, we don't need to do that here.

The output is converted into an immutable Clojure hashmap at the end. You can remove that step and just use the ConcurrentHashMap instance for an additional speedup—on my machine, removing the into step makes the whole fold take about 26ms.

Edit 2017-11-20: User @clojuremostly correctly pointed out that an earlier version of this answer had the call to quick-bench inside the let block that initialized the concurrent hash map instance, which meant that the benchmark used the same instance for all of its runs. I moved the call to quick-bench to be outside the let block. It did not significantly affect the results.

like image 58
Jeff Terrell Ph.D. Avatar answered Oct 18 '22 12:10

Jeff Terrell Ph.D.