I'm facing the general problem to compile large data sets to an on-disk representation, which can be efficiently de-serialized to native in-memory Haskell data structures.
More specifically, I have a large amount of graph-data with various attributes associated with edges and vertices. In C/C++ I have compiled the data to a mmap()
able represenation for maximum efficiency, which currently results in about 200MiB worth of C-structures (and whose text representation is about 600 MiB).
What is the next-best thing I can do in (GHC) Haskell?
Use the package binary. It provides a toolbox to efficiently serialize and deserialize data in Haskell. binary can automagically derive instances of the requred typeclasses for you, but you can also write optimized instances manually.
Quoted from the originial description page:
The binary package
Efficient, pure binary serialisation using lazy ByteStrings. Haskell values may be encoded to and from binary formats, written to disk as binary, or sent over the network. Serialisation speeds of over 1 G/sec have been observed, so this library should be suitable for high performance scenarios.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With