Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there any hope to cast ForeignPtr to ByteArray# (for a function :: ByteString -> Vector)

Tags:

haskell

ghc

For performance reasons I would like a zero-copy cast of ByteString (strict, for now) to a Vector. Since Vector is just a ByteArray# under the hood, and ByteString is a ForeignPtr this might look something like:

caseBStoVector :: ByteString -> Vector a
caseBStoVector (BS fptr off len) =
    withForeignPtr fptr $ \ptr -> do
        let ptr' = plusPtr ptr off
            p = alignPtr ptr' (alignment (undefined :: a))
            barr = ptrToByteArray# p len  -- I want this function, or something similar 
            barr' = ByteArray barr
            alignI = minusPtr p ptr
            size = (len-alignI) `div` sizeOf (undefined :: a)
        return (Vector 0 size barr')

That certainly isn't right. Even with the missing function ptrToByteArray# this seems to need to escape the ptr outside of the withForeignPtr scope. So my quesetions are:

  1. This post probably advertises my primitive understanding of ByteArray#, if anyone can talk a bit about ByteArray#, it's representation, how it is managed (GCed), etc I'd be grateful.

  2. The fact that ByteArray# lives on the GCed heap and ForeignPtr is external seems to be a fundamental issue - all the access operations are different. Perhaps I should look at redefining Vector from = ByteArray !Int !Int to something with another indirection? Someing like = Location !Int !Int where data Location = LocBA ByteArray | LocFPtr ForeignPtr and provide wrapping operations for both those types? This indirection might hurt performance too much though.

  3. Failing to marry these two together, maybe I can just access arbitrary element types in a ForeignPtr in a more efficient manner. Does anyone know of a library that treats ForeignPtr (or ByteString) as an array of arbitrary Storable or Primitive types? This would still lose me the stream fusion and tuning from the Vector package.

like image 940
Thomas M. DuBuisson Avatar asked Feb 05 '11 18:02

Thomas M. DuBuisson


2 Answers

Disclaimer: everything here is an implementation detail and specific to GHC and the internal representations of the libraries in question at the time of posting.

This response is a couple years after the fact, but it is indeed possible to get a pointer to bytearray contents. It's problematic as the GC likes to move data in the heap around, and things outside of the GC heap can leak, which isn't necessarily ideal. GHC solves this with:

newPinnedByteArray# :: Int# -> State# s -> (#State# s, MutableByteArray# s#)

Primitive bytearrays (internally typedef'd C char arrays) can be statically pinned to an address. The GC guarantees not to move them. You can convert a bytearray reference to a pointer with this function:

byteArrayContents# :: ByteArray# -> Addr#

The address type forms the basis of Ptr and ForeignPtr types. Ptrs are addresses marked with a phantom type and ForeignPtrs are that plus optional references to GHC memory and IORef finalizers.

Disclaimer: This will only work if your ByteString was built Haskell. Otherwise, you can't get a reference to the bytearray. You cannot dereference an arbitrary addr. Don't try to cast or coerce your way to a bytearray; that way lies segfaults. Example:

{-# LANGUAGE MagicHash, UnboxedTuples #-}

import GHC.IO
import GHC.Prim
import GHC.Types

main :: IO()
main = test

test :: IO ()        -- Create the test array.
test = IO $ \s0 -> case newPinnedByteArray# 8# s0 of {(# s1, mbarr# #) ->
                     -- Write something and read it back as baseline.
                   case writeInt64Array# mbarr# 0# 1# s1 of {s2 ->
                   case readInt64Array# mbarr# 0# s2 of {(# s3, x# #) ->
                     -- Print it. Should match what was written.
                   case unIO (print (I# x#)) s3 of {(# s4, _ #) ->
                     -- Convert bytearray to pointer.
                   case byteArrayContents# (unsafeCoerce# mbarr#) of {addr# ->
                     -- Dereference the pointer.
                   case readInt64OffAddr# addr# 0# s4 of {(# s5, x'# #) ->
                     -- Print what's read. Should match the above.
                   case unIO (print (I# x'#)) s5 of {(# s6, _ #) ->
                     -- Coerce the pointer into an array and try to read.
                   case readInt64Array# (unsafeCoerce# addr#) 0# s6 of {(# s7, y# #) ->
                     -- Haskell is not C. Arrays are not pointers.
                     -- This won't match. It might segfault. At best, it's garbage.
                   case unIO (print (I# y#)) s7 of (# s8, _ #) -> (# s8, () #)}}}}}}}}


Output:
   1
   1
 (some garbage value)

To get the bytearray from a ByteString, you need to import the constructor from Data.ByteString.Internal and pattern match.

data ByteString = PS !(ForeignPtr Word8) !Int !Int
(\(PS foreignPointer offset length) -> foreignPointer)

Now we need to rip the goods out of the ForeignPtr. This part is entirely implementation-specific. For GHC, import from GHC.ForeignPtr.

data ForeignPtr a = ForeignPtr Addr# ForeignPtrContents
(\(ForeignPtr addr# foreignPointerContents) -> foreignPointerContents)

data ForeignPtrContents = PlainForeignPtr !(IORef (Finalizers, [IO ()]))
                        | MallocPtr      (MutableByteArray# RealWorld) !(IORef (Finalizers, [IO ()]))
                        | PlainPtr       (MutableByteArray# RealWorld)

In GHC, ByteString is built with PlainPtrs which are wrapped around pinned byte arrays. They carry no finalizers. They are GC'd like regular Haskell data when they fall out of scope. Addrs don't count, though. GHC assumes they point to things outside of the GC heap. If the bytearray itself falls out of the scope, you're left with a dangling pointer.

data PlainPtr = (MutableByteArray# RealWorld)
(\(PlainPtr mutableByteArray#) -> mutableByteArray#)

MutableByteArrays are identical to ByteArrays. If you want true zero-copy construction, make sure you either unsafeCoerce# or unsafeFreeze# to a bytearray. Otherwise, GHC creates a duplicate.

mbarrTobarr :: MutableByteArray# s -> ByteArray#
mbarrTobarr = unsafeCoerce#

And now you have the raw contents of the ByteString ready to be turned into a vector.

Best Wishes,

like image 116
user2472093 Avatar answered Nov 13 '22 08:11

user2472093


You might be able to hack together something :: ForeignPtr -> Maybe ByteArray#, but there is nothing you can do in general.

You should look at the Data.Vector.Storable module. It includes a function unsafeFromForeignPtr :: ForeignPtr a -> Int -> Int -> Vector a. It sounds like what you want.

There is also a Data.Vector.Storable.Mutable variant.

like image 2
Antoine Latter Avatar answered Nov 13 '22 09:11

Antoine Latter