Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Write binary data with Haskell to be read by C?

I have a file containing a [Double] serialized by Data.Binary that I'd like to read with C. That is, I want to write a C program that reads that data into memory as double[]. I'm planning on writing a Haskell program to deserialize the data file and then write the binary data into a new, simpler file that I can just directly read into C, but I'm not sure how to write out just the raw binary data (e.g. 8 bytes for a double).

like image 603
erjiang Avatar asked Jan 23 '12 00:01

erjiang


2 Answers

You can reuse Data.Binary for the purpose with the data-binary-ieee754 package, which allows serialising Floats and Doubles as their IEEE representation. For example:

import Data.List
import Data.Binary.Put
import Data.Binary.IEEE754
import Control.Monad

putRawDoubles :: [Double] -> Put
putRawDoubles xs = do
  putWord64le $ genericLength xs
  mapM_ putFloat64le xs

It would be nice if there was an analogue of putWord64host for Doubles in data-binary-ieee754, but since there isn't I just went with little-endian. If you want to be portable across endiannesses without explicitly handling the conversion in your C program, you could try putWord64host . doubleToWord (doubleToWord is also from Data.Binary.IEEE754). Though I think that integer endianness differs from floating-point endianness on some platforms...

Incidentally, I would suggest using a format like this even for your regular serialisation; IEEE floats are universal, and binary's default floating-point format is wasteful (as Daniel Fischer points out).

You might also want to consider the cereal serialisation library, which is faster than binary, better-maintained (binary hasn't been updated since 2009) and has IEEE float format support built-in.

like image 158
ehird Avatar answered Nov 08 '22 23:11

ehird


Using Data.Binary to serialize Double or Float values is not great for portability. The Binary instances serialize the values in the form obtained by decodeFloat, i.e. as a mantissa and an exponent. The mantissa is serialized as an Integer. Parsing that is inconvenient. Much better, as has already suggested by ehird, is using a variant that serializes them as the bit-pattern of the IEEE-754 representation, as offered by cereal-ieee754 - as ehird reminded me, that has been merged (minus some conversion between floating point and word types) into cereal - or the already mentioned data-binary-ieee754. Another option is serializing them as strings via show. That has the advantage of avoiding any endianness problems.

like image 30
Daniel Fischer Avatar answered Nov 08 '22 22:11

Daniel Fischer