Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the idiomatic way to obtain a sequence of columns from an incanter dataset?

What's the best way to get a sequence of columns (as vectors or whatever) from an Incanter data set?

I thought of:

(to-vect (trans (to-matrix my-dataset)))

But Ideally, I'd like a lazy sequence. Is there a better way?

like image 283
Rob Lachlan Avatar asked Mar 30 '11 04:03

Rob Lachlan


3 Answers

Use the $ macro.

=> (def data (to-dataset [{:a 1 :b 2} {:a 3 :b 4}]))
=> ($ :a data)  ;; :a column
=> ($ 0 :all data) ;; first row

=> (type ($ :a data))
clojure.lang.LazySeq
like image 196
Paul Lam Avatar answered Oct 29 '22 20:10

Paul Lam


Looking at the source code for to-vect it makes use of map to build up the result, which is already providing one degree of lazyness. Unfortunately, it looks like the whole data set is first converted toArray, probably just giving away all the benefits of map lazyness.

If you want more, you probably have to dive into the gory details of the Java object effectively holding the matrix version of the data set and write your own version of to-vect.

like image 29
skuro Avatar answered Oct 29 '22 21:10

skuro


You could use the internal structure of the dataset.

user=> (use 'incanter.core)
nil
user=> (def d (to-dataset [{:a 1 :b 2} {:a 3 :b 4}]))
#'user/d
user=> (:column-names d)
[:a :b]
user=> (:rows d)
[{:a 1, :b 2} {:a 3, :b 4}]
user=> (defn columns-of
         [dataset]
         (for [column (:column-names dataset)]
           (map #(get % column) (:rows dataset))))
#'user/columns-of
user=> (columns-of d)
((1 3) (2 4))

Although I'm not sure in how far the internal structure is public API. You should probably check that with the incanter guys.

like image 45
kotarak Avatar answered Oct 29 '22 21:10

kotarak