Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Clojure: sequence back to vector

Tags:

clojure

How can I cast a sequence back to vector after a sequence producing operation (like sort)? Does using (vec..) on a sequence that was a vector is costly?

One (bad?) possibility is creating a new vector out of sequence:

(vec (sort [1 2 3 4 5 6]))

I am asking because I need random access (nth ..) to huge sorted vectors - which are now huge sequences after the sort, with horrible O(n) random access time

like image 700
GabiMe Avatar asked Jan 01 '10 18:01

GabiMe


Video Answer


2 Answers

Meikel Brandmeyer just posted a solution to this on the Clojure group.

(defn sorted-vec
  [coll]
  (let [arr (into-array coll)]
    (java.util.Arrays/sort arr)
    (vec arr)))

Clojure's sort returns a seq across a sorted array; this approach does much the same thing, but returns a vector, not a seq.

If you wish, you can even skip the conversion back into a Clojure persistent data structure:

(defn sorted-arr
  "Returns a *mutable* array!"
  [coll]
  (doto (into-array coll)]
    (java.util.Arrays/sort))

but the resulting Java array (which you can treat as a Clojure collection in most cases) will be mutable. That's fine if you're not handing it off to other code, but be careful.

like image 69
Rich Avatar answered Sep 19 '22 03:09

Rich


From my own tests (nothing scientific) you may be better with working directly on arrays in cases where you do lots of sorting. But if you sort rarely and have a lots of random access to do though, going with a vector may be a better choice as random access time is more than 40% faster on average, but the sorting performance is horrible due to converting the vector to an array and then back to a vector. Here's my findings:

(def foo (int-array (range 1000)))

(time
  (dotimes [_ 10000]
    (java.util.Arrays/sort foo)))

; Elapsed time: 652.185436 msecs

(time
  (dotimes [_ 10000]
    (nth foo (rand-int 1000))))

; Elapsed time: 7.900073 msecs

(def bar (vec (range 1000)))

(time
  (dotimes [_ 10000]
    (vec (sort bar))))

; Elapsed time: 2810.877103 msecs

(time
  (dotimes [_ 10000]
    (nth bar (rand-int 1000))))

; Elapsed time: 5.500802 msecs

P.S.: Note that the vector version doesn't actually store the sorted vector anywhere, but that shouldn't change the result considerably as you would use simple bindings in a loop for speed.

like image 41
Nicolas Buduroi Avatar answered Sep 23 '22 03:09

Nicolas Buduroi