I need to mutate elements in a fixed size array of Word8
very quickly. For this purpose I am using an IOUArray
. I need to send this array over a websocket connection. The function sendBinaryData
from the websockets package requires a ByteString
. I need to convert from one representation to the other. I am using this function currently:
arrayToBS :: IOUArray Int Word8 -> IO (BS.ByteString)
arrayToBS = (fmap BS.pack) . getElems
This function converts the elements of an array to a [Word8]
before packing that list into a bytestring, and from profiling I can see that it is quite slow. I was wondering if there a way to speed up this function, or possibly send the array over a websocket connection directly?
The array that I am using currently is:
size = 1000;
numBytes = size * size * 4
newBuffer :: IO (IOUArray Int Word8)
newBuffer = newArray (0, numBytes) 200 :: IO (IOUArray Int Word8)
and an except from the performance report:
COST CENTRE MODULE SRC %time %alloc
arrayToBS Lib src/Lib.hs:28:1-37 88.1 99.0
newBuffer Lib src/Lib.hs:(23,1)-(25,12) 9.9 0.8
Ideally arrayToBS
would be much faster than creating the array.
If I change the size
to 100:
COST CENTRE MODULE SRC %time %alloc
arrayToBS Lib src/Lib.hs:21:1-37 100.0 86.1
mkEncodeTable.table Data.ByteString.Base64.Internal Data/ByteString/Base64/Internal.hs:105:5-75 0.0 8.0
mkEncodeTable.ix Data.ByteString.Base64.Internal Data/ByteString/Base64/Internal.hs:104:5-43 0.0 1.1
Disclaimer: I'm not very familiar with these low level primitives so this might be unsafe in some cases.
You will at the very least need to copy the data over once since, as @user2407038 remarks, the underlying data stored in an IOUArray
is an unpinned array, so we can't count on GHC not moving the array around. The reverse direction (ByteString
to IOArray
) is however possible without a copy.
{-# LANGUAGE UnboxedTuples, MagicHash #-}
import Data.ByteString.Internal (ByteString(..))
import Data.Array.IO.Internals (IOUArray(..))
import Data.Array.Base (STUArray(..))
import Data.Word (Word8)
import Foreign.ForeignPtr (mallocForeignPtrBytes, withForeignPtr)
import GHC.IO (IO(..))
import GHC.Exts (copyMutableByteArrayToAddr#, Ptr(..), Int(..))
arrayToBS :: IOUArray Int Word8 -> IO ByteString
arrayToBS (IOUArray (STUArray _ _ n@(I# n') mutByteArr)) = do
bytes <- mallocForeignPtrBytes n
withForeignPtr bytes $ \(Ptr addr) -> IO $ \state ->
(# copyMutableByteArrayToAddr# mutByteArr 0# addr n' state, () #)
pure (PS bytes 0 n)
Here is a test of this working (remember that the ascii code for 'A'
is 65
):
ghci> iou <- newListArray (-2,9) [65,67..] :: IO (IOUArray Int Word8)
ghci> arrayToBS iou
"ACEGIKMOQSUW"
Ok, thanks to user2407038 I have something (Note that I have never played with primitives or unboxed types before):
import Control.Monad.ST
import qualified Data.ByteString as BS
import Data.Word
import Data.Array.ST
import Data.Array.Base
import Data.ByteString.Internal
import GHC.Prim
import GHC.Exts
import GHC.ForeignPtr
bs2Addr# :: BS.ByteString -> Addr#
bs2Addr# (PS fptr offset len) = case fptr of
(ForeignPtr addr _ ) -> addr
arrayPrim (STUArray _ _ _ x) = x
unbox :: Int -> Int#
unbox (I# n#) = n#
copy :: Int -> IO BS.ByteString
copy len = do
-- Get the length as unboxed
let len# = unbox len
-- Bytestring to copy to, filled with 0s initially
let bs = BS.pack (replicate len 0)
-- Create a new STUArray. I don't know why it needs to be length * 2.
arr <- stToIO (newArray (0, len * 2) 255 :: ST s (STUArray s Int Word8))
-- MutableByteArray#
let mArrPrim = arrayPrim arr
-- Addr#
let addr = bs2Addr# bs
-- I don't know what the 2nd and 4th Int# arguments are suppose to be.
let _ = copyMutableByteArrayToAddr# mArrPrim len# addr len# realWorld#
return bs
I am using STUArray
here instead of IOUArray
now because I couldn't find the IOUArray
constructor.
Results of profiling this code with 4000000 element array:
Sun Aug 20 20:49 2017 Time and Allocation Profiling Report (Final)
shoot-exe +RTS -N -p -RTS
total time = 0.05 secs (47 ticks @ 1000 us, 1 processor)
total alloc = 204,067,640 bytes (excludes profiling overheads)
COST CENTRE MODULE SRC %time %alloc
copy.bs Lib src/Lib.hs:32:7-36 66.0 96.0
copy Lib src/Lib.hs:(27,1)-(45,11) 34.0 3.9
This is the code that I compared it against:
arrayToBS :: (STUArray s Int Word8) -> ST s (BS.ByteString)
arrayToBS = (fmap BS.pack) . getElems
slowCopy :: Int -> IO BS.ByteString
slowCopy len = do
arr <- stToIO (newArray (0, len - 1) 255 :: ST s (STUArray s Int Word8))
stToIO $ arrayToBS arr
And its profiling report:
Sun Aug 20 20:48 2017 Time and Allocation Profiling Report (Final)
shoot-exe +RTS -N -p -RTS
total time = 0.55 secs (548 ticks @ 1000 us, 1 processor)
total alloc = 1,604,073,872 bytes (excludes profiling overheads)
COST CENTRE MODULE SRC %time %alloc
arrayToBS Lib src/Lib.hs:48:1-37 98.2 99.7
slowCopy Lib src/Lib.hs:(51,1)-(53,24) 1.6 0.2
OK, new version is faster. They both produce the same output. However, I would still like to know what the #Int
parameters to copyMutableByteArrayToAddr#
and why I have to multiply the length of the array in the fast version by 2. I'll play around a bit more and update this answer if I find out.
For those curious, this is the result of profiling Alec's answer:
Sun Aug 20 21:13 2017 Time and Allocation Profiling Report (Final)
shoot-exe +RTS -N -p -RTS
total time = 0.01 secs (7 ticks @ 1000 us, 1 processor)
total alloc = 8,067,696 bytes (excludes profiling overheads)
COST CENTRE MODULE SRC %time %alloc
newBuffer Other src/Other.hs:23:1-33 85.7 49.6
arrayToBS.\.\ Other src/Other.hs:19:5-69 14.3 0.0
arrayToBS Other src/Other.hs:(16,1)-(20,21) 0.0 49.6
Looks like that's the one to use.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With