What's the best way to get a sequence of columns (as vectors or whatever) from an Incanter data set?
I thought of:
(to-vect (trans (to-matrix my-dataset)))
But Ideally, I'd like a lazy sequence. Is there a better way?
Use the $
macro.
=> (def data (to-dataset [{:a 1 :b 2} {:a 3 :b 4}]))
=> ($ :a data) ;; :a column
=> ($ 0 :all data) ;; first row
=> (type ($ :a data))
clojure.lang.LazySeq
Looking at the source code for to-vect
it makes use of map
to build up the result, which is already providing one degree of lazyness. Unfortunately, it looks like the whole data set is first converted toArray
, probably just giving away all the benefits of map
lazyness.
If you want more, you probably have to dive into the gory details of the Java object effectively holding the matrix version of the data set and write your own version of to-vect.
You could use the internal structure of the dataset.
user=> (use 'incanter.core)
nil
user=> (def d (to-dataset [{:a 1 :b 2} {:a 3 :b 4}]))
#'user/d
user=> (:column-names d)
[:a :b]
user=> (:rows d)
[{:a 1, :b 2} {:a 3, :b 4}]
user=> (defn columns-of
[dataset]
(for [column (:column-names dataset)]
(map #(get % column) (:rows dataset))))
#'user/columns-of
user=> (columns-of d)
((1 3) (2 4))
Although I'm not sure in how far the internal structure is public API. You should probably check that with the incanter guys.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With