Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Binary Serialization for Lists of Undefined Length in Haskell

I've been using Data.Binary to serialize data to files. In my application I incrementally add items to these files. The two most popular serialization packages, binary and cereal, both serialize lists as a count followed by the list items. Because of this, I can't append to my serialized files. I currently read in the whole file, deserialize the list, append to the list, re-serialize the list, and write it back out to the file. However, my data set is getting large and I'm starting to run out of memory. I could probably go around unboxing my data structures to gain some space, but that approach doesn't scale.

One solution would be to get down and dirty with the file format to change the initial count, then just append my elements. But that's not very satisfying, not to mention being sensitive to future changes in the file format as a result of breaking the abstraction. Iteratees/Enumerators come to mind as an attractive option here. I looked for a library combining them with a binary serialization, but didn't find anything. Anyone know if this has been done already? If not, would a library for this be useful? Or am I missing something?

like image 852
mightybyte Avatar asked Jun 01 '11 17:06

mightybyte


1 Answers

So I say stick with Data.Binary but write a new instance for growable lists. Here's the current (strict) instance:

instance Binary a => Binary [a] where
    put l  = put (length l) >> mapM_ put l
    get    = do n <- get :: Get Int
                getMany n

-- | 'getMany n' get 'n' elements in order, without blowing the stack.
getMany :: Binary a => Int -> Get [a]
getMany n = go [] n
 where
    go xs 0 = return $! reverse xs
    go xs i = do x <- get
                 x `seq` go (x:xs) (i-1)
{-# INLINE getMany #-}

Now, a version that lets you stream (in binary) to append to a file would need to be eager or lazy. The lazy version is the most trivial. Something like:

import Data.Binary

newtype Stream a = Stream { unstream :: [a] }

instance Binary a => Binary (Stream a) where

    put (Stream [])     = putWord8 0
    put (Stream (x:xs)) = putWord8 1 >> put x >> put (Stream xs)

    get = do
        t <- getWord8
        case t of
            0 -> return (Stream [])
            1 -> do x         <- get
                    Stream xs <- get
                    return (Stream (x:xs))

Massaged appropriately works for streaming. Now, to handle silently appending, we'll need to be able to seek to the end of the file, and overwrite the final 0 tag, before adding more elements.

like image 55
Don Stewart Avatar answered Nov 15 '22 20:11

Don Stewart