I'm looking for a way to efficiently serialize Clojure objects into a binary format - i.e. not just doing the classic print and read text serialization.
i.e. I want to do something like:
(def orig-data {:name "Data Object"
:data (get-big-java-array)
:other (get-clojure-data-stuff)})
(def binary (serialize orig-data))
;; here "binary" is a raw binary form, e.g. a Java byte array
;; so it can be persisted in key/value store or sent over network etc.
;; now check it works!
(def new-data (deserialize binary))
(= new-data orig-data)
=> true
The motivation is that I have some large data structures that contain a significant amount of binary data (in Java arrays), and I want to avoid the overhead of converting these all to text and back again. In addition, I'm trying to keep the format compact in order to minimise network bandwidth usage.
Specific features I'd like to have:
What's the best / standard approach to doing this in Clojure?
I may be missing something here, but what's wrong with the standard Java serialization? Too slow, too big, something else?
A Clojure wrapper for plain Java serialization could be something like this:
(defn serializable? [v]
(instance? java.io.Serializable v))
(defn serialize
"Serializes value, returns a byte array"
[v]
(let [buff (java.io.ByteArrayOutputStream. 1024)]
(with-open [dos (java.io.ObjectOutputStream. buff)]
(.writeObject dos v))
(.toByteArray buff)))
(defn deserialize
"Accepts a byte array, returns deserialized value"
[bytes]
(with-open [dis (java.io.ObjectInputStream.
(java.io.ByteArrayInputStream. bytes))]
(.readObject dis)))
user> (= (range 10) (deserialize (serialize (range 10))))
true
There are values that cannot be serialized, e.g. Java streams and Clojure atom/agent/future, but it should work for most plain values, including Java primitives and arrays and Clojure functions, collections and records.
Whether you actually save anything depends. In my limited testing on smallish data sets serializing to text and binary seems to be about the same time and space.
But for the special case where the bulk of the data is arrays of Java primitives, Java serialization can be orders of magnitude faster and save a significant chunk of space. (Quick test on a laptop, 100k random bytes: serialize 0.9 ms, 100kB; text 490 ms, 700kB.)
Note that the (= new-data orig-data)
test doesn't work for arrays (it delegates to Java's equals
, which for arrays just tests whether it's the same object), so you may want/need to write your own equality function to test the serialization.
user> (def a (range 10))
user> (= a (range 10))
true
user> (= (into-array a) (into-array a))
false
user> (.equals (into-array a) (into-array a))
false
user> (java.util.Arrays/equals (into-array a) (into-array a))
true
Nippy is one of the best choices imho: https://github.com/ptaoussanis/nippy
Have you considered Google's protobuf? You might want to check the GitHub repository with the interface for Clojure.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With