Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In Clojure, can I optimize scans?

;; Suppose we want to compute the min and max of a collection.
;; Ideally there would be a way to tell Clojure that we want to perform
;; only one scan, which will theoretically save a little time  

;; First we define some data to test with
;; 10MM element lazy-seq
(def data (for [x (range 10000000)] (rand-int 100)))

;; Realize the lazy-seq 
(dorun data)

;; Here is the amount of time it takes to go through the data once
(time (apply min data))
==> "Elapsed time: 413.805 msecs"

;; Here is the time to calc min, max by explicitly scanning twice
(time (vector (apply min data) (apply max data)))
==> "Elapsed time: 836.239 msecs"

;; Shouldn't this be more efficient since it's going over the data once?
(time (apply (juxt min max) data))
==> "Elapsed time: 833.61 msecs"

Chuck, here are my results after using your solution:

test.core=> (def data (for [x (range 10000000)] (rand-int 100)))
#'test.core/data

test.core=> (dorun data)
nil

test.core=> (realized? data)
true

test.core=> (defn minmax1 [coll] (vector (apply min coll) (apply max coll)))    
#'test.core/minmax1

test.core=> (defn minmax2 [[x & xs]] (reduce (fn [[tiny big] n] [(min tiny n) (max big n)]) [x x] xs))    
#'test.core/minmax2

test.core=> (time (minmax1 data))
"Elapsed time: 806.161 msecs"
[0 99]

test.core=> (time (minmax2 data))
"Elapsed time: 6072.587 msecs"
[0 99]
like image 891
Badmanchild Avatar asked Feb 14 '23 05:02

Badmanchild


1 Answers

This isn't precisely going to answer your general question (i.e. how to scan Clojure data structures), but it is worth being aware that this kind of code is often going to be better suited to specialised data structures / libraries if you really care about performance.

e.g. using core.matrix / vectorz-clj and a little bit of cheeky Java interop:

;; define the raw data
(def data (for [x (range 10000000)] (rand-int 100)))

;; convert to a Vectorz array
(def v (array :vectorz data))

(time (Vectorz/minValue v))
"Elapsed time: 18.974904 msecs"
0.0

(time (Vectorz/maxValue v))
"Elapsed time: 21.310835 msecs"
99.0

i.e. this is around 20-50x faster than the original code given in the question.

I doubt you'll get remotely close to that with any code that depends on scanning over regular Clojure vectors, whether you do it in one pass or otherwise. Basically - use the right tool for the job.

like image 154
mikera Avatar answered Mar 06 '23 16:03

mikera