Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scientific dataset manipulation in Clojure -- reading ByteBuffers into matrices

I'm looking to use Clojure and Incanter for processing of a large scientific dataset; specifically, the 0.5 degree version of this dataset (only available in binary format).

My question is, what recommendations do you have for elegant ways to deal with this problem in Java/Clojure? Is there a simple way to get this dataset into Incanter, or some other java matrix package?

I managed to read the binary data into a java.nio.ByteBuffer using the following code:

(defn to-float-array [^String str]
  (-> (io/to-byte-array (io/to-file str))
      java.nio.ByteBuffer/wrap
      (.order java.nio.ByteOrder/LITTLE_ENDIAN)))

Now, I'm really struggling with how I can begin to manipulate this ByteBuffer as an array. I've been using Python's NumPy, which makes it very easy to manipulate these huge datasets. Here's the python code for what I'm looking to do:

// reshape row vector into (time, lat_slices, lon_slices)
// then cut out every other row
rain_data = np.fromfile("path/to/file", dtype="f")
rain_data = rain_data.reshape(24, 360, 720);
rain_data = rain_data[0:23:2,:,:];

After this slicing, I want to return a vector of these twelve arrays. (I need to manipulate them each separately as future function inputs.)

So, any advice on how to get this dataset into Incanter would be much appreciated.

like image 742
Sam Ritchie Avatar asked Feb 01 '11 17:02

Sam Ritchie


1 Answers

I don't know how to convert your ByteBuffer into an array, but here's an implementation of the reshape function:

(defn reshape [v c]
  (if (= (count v) 1)
    c
    (recur (butlast v)
           (partition (last v) c))))

(This works fine in my limited testing.) If your data is in a vector r then you can implement

rain_data = rain_data.reshape(24, 360, 720);

as

(reshape '(24 360 720) r)
like image 77
bdesham Avatar answered Sep 26 '22 05:09

bdesham