Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to persist large data for efficient deserialization in Haskell

Tags:

haskell

I'm facing the general problem to compile large data sets to an on-disk representation, which can be efficiently de-serialized to native in-memory Haskell data structures.

More specifically, I have a large amount of graph-data with various attributes associated with edges and vertices. In C/C++ I have compiled the data to a mmap()able represenation for maximum efficiency, which currently results in about 200MiB worth of C-structures (and whose text representation is about 600 MiB).

What is the next-best thing I can do in (GHC) Haskell?

like image 747
hvr Avatar asked Jul 02 '11 08:07

hvr


1 Answers

Use the package binary. It provides a toolbox to efficiently serialize and deserialize data in Haskell. binary can automagically derive instances of the requred typeclasses for you, but you can also write optimized instances manually.

Quoted from the originial description page:

The binary package

Efficient, pure binary serialisation using lazy ByteStrings. Haskell values may be encoded to and from binary formats, written to disk as binary, or sent over the network. Serialisation speeds of over 1 G/sec have been observed, so this library should be suitable for high performance scenarios.

like image 119
fuz Avatar answered Sep 17 '22 15:09

fuz