I'm looking to use Clojure and Incanter for processing of a large scientific dataset; specifically, the 0.5 degree version of this dataset (only available in binary format).
My question is, what recommendations do you have for elegant ways to deal with this problem in Java/Clojure? Is there a simple way to get this dataset into Incanter, or some other java matrix package?
I managed to read the binary data into a java.nio.ByteBuffer
using the following code:
(defn to-float-array [^String str]
(-> (io/to-byte-array (io/to-file str))
java.nio.ByteBuffer/wrap
(.order java.nio.ByteOrder/LITTLE_ENDIAN)))
Now, I'm really struggling with how I can begin to manipulate this ByteBuffer
as an array. I've been using Python's NumPy, which makes it very easy to manipulate these huge datasets. Here's the python code for what I'm looking to do:
// reshape row vector into (time, lat_slices, lon_slices)
// then cut out every other row
rain_data = np.fromfile("path/to/file", dtype="f")
rain_data = rain_data.reshape(24, 360, 720);
rain_data = rain_data[0:23:2,:,:];
After this slicing, I want to return a vector of these twelve arrays. (I need to manipulate them each separately as future function inputs.)
So, any advice on how to get this dataset into Incanter would be much appreciated.
I don't know how to convert your ByteBuffer
into an array, but here's an implementation of the reshape
function:
(defn reshape [v c]
(if (= (count v) 1)
c
(recur (butlast v)
(partition (last v) c))))
(This works fine in my limited testing.) If your data is in a vector r
then you can implement
rain_data = rain_data.reshape(24, 360, 720);
as
(reshape '(24 360 720) r)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With