Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Converting IEEE 754 floating point in Haskell Word32/64 to and from Haskell Float/Double

Question

In Haskell, the base libraries and Hackage packages provide several means of converting binary IEEE-754 floating point data to and from the lifted Float and Double types. However, the accuracy, performance, and portability of these methods are unclear.

For a GHC-targeted library intended to (de)serialize a binary format across platforms, what is the best approach for handling IEEE-754 floating point data?

Approaches

These are the methods I've encountered in existing libraries and online resources.

FFI Marshaling

This is the approach used by the data-binary-ieee754 package. Since Float, Double, Word32 and Word64 are each instances of Storable, one can poke a value of the source type into an external buffer, and then peek a value of the target type:

toFloat :: (F.Storable word, F.Storable float) => word -> float toFloat word = F.unsafePerformIO $ F.alloca $ \buf -> do     F.poke (F.castPtr buf) word     F.peek buf 

On my machine this works, but I cringe to see allocation being performed just to accomplish the coercion. Also, although not unique to this solution, there's an implicit assumption here that IEEE-754 is actually the in-memory representation. The tests accompanying the package give it the "works on my machine" seal of approval, but this is not ideal.

unsafeCoerce

With the same implicit assumption of in-memory IEEE-754 representation, the following code gets the "works on my machine" seal as well:

toFloat :: Word32 -> Float toFloat = unsafeCoerce 

This has the benefit of not performing explicit allocation like the approach above, but the documentation says "it is your responsibility to ensure that the old and new types have identical internal representations". That implicit assumption is still doing all the work, and is even more strenuous when dealing with lifted types.

unsafeCoerce#

Stretching the limits of what might be considered "portable":

toFloat :: Word -> Float toFloat (W# w) = F# (unsafeCoerce# w) 

This seems to work, but doesn't seem practical at all since it's limited to the GHC.Exts types. It's nice to bypass the lifted types, but that's about all that can be said.

encodeFloat and decodeFloat

This approach has the nice property of bypassing anything with unsafe in the name, but doesn't seem to get IEEE-754 quite right. A previous SO answer to a similar question offers a concise approach, and the ieee754-parser package used a more general approach before being deprecated in favor of data-binary-ieee754.

There's quite a bit of appeal to having code that needs no implicit assumptions about underlying representation, but these solutions rely on encodeFloat and decodeFloat, which are apparently fraught with inconsistencies. I've not yet found a way around these problems.

like image 674
acfoltzer Avatar asked Aug 08 '11 00:08

acfoltzer


People also ask

What are the two main standards for floating point representation?

There are three binary floating-point basic formats (encoded with 32, 64 or 128 bits) and two decimal floating-point basic formats (encoded with 64 or 128 bits). The binary32 and binary64 formats are the single and double formats of IEEE 754-1985 respectively.


2 Answers

Simon Marlow mentions another approach in GHC bug 2209 (also linked to from Bryan O'Sullivan's answer)

You can achieve the desired effect using castSTUArray, incidentally (this is the way we do it in GHC).

I've used this option in some of my libraries in order to avoid the unsafePerformIO required for the FFI marshalling method.

{-# LANGUAGE FlexibleContexts #-}  import Data.Word (Word32, Word64) import Data.Array.ST (newArray, castSTUArray, readArray, MArray, STUArray) import GHC.ST (runST, ST)  wordToFloat :: Word32 -> Float wordToFloat x = runST (cast x)  floatToWord :: Float -> Word32 floatToWord x = runST (cast x)  wordToDouble :: Word64 -> Double wordToDouble x = runST (cast x)  doubleToWord :: Double -> Word64 doubleToWord x = runST (cast x)  {-# INLINE cast #-} cast :: (MArray (STUArray s) a (ST s),          MArray (STUArray s) b (ST s)) => a -> ST s b cast x = newArray (0 :: Int, 0) x >>= castSTUArray >>= flip readArray 0 

I inlined the cast function because doing so causes GHC to generate much tighter core. After inlining, wordToFloat is translated to a call to runSTRep and three primops (newByteArray#, writeWord32Array#, readFloatArray#).

I'm not sure what performance is like compared to the FFI marshalling method, but just for fun I compared the core generated by both options.

Doing FFI marshalling is a fair bit more complicated in this regard. It calls unsafeDupablePerformIO and 7 primops (noDuplicate#, newAlignedPinnedByteArray#, unsafeFreezeByteArray#, byteArrayContents#, writeWord32OffAddr#, readFloatOffAddr#, touch#).

I've only just started learning how to analyse core, perhaps someone with more experience can comment on the cost of these operations?

like image 194
Jacob Stanley Avatar answered Oct 15 '22 20:10

Jacob Stanley


All modern CPUs use IEEE754 for floating point, and this seems very unlikely to change within our lifetime. So don't worry about code making that assumption.

You are very definitely not free to use unsafeCoerce or unsafeCoerce# to convert between integral and floating point types, as this can cause both compilation failures and runtime crashes. See GHC bug 2209 for details.

Until GHC bug 4092, which addresses the need for int↔fp coercions, is fixed, the only safe and reliable approach is via the FFI.

like image 40
Bryan O'Sullivan Avatar answered Oct 15 '22 20:10

Bryan O'Sullivan