I'm trying to figure out if it's possible to put a MutableArray#
in
a compact region. Doing this is explicitly discouraged by ghc devs
and by the documentation
because it allows users to point to something outside of the Compact
.
That aside, I'm still interested in trying to do this, with the understanding
that I would be responsible for ensuring that the array only point to things
inside the same Compact
.
My line of thinking was to add an Array#
to a compact region and then
attempt to thaw it using unsafeThawArray#
:
unsafeThawArray# :: Array# a -> State# s -> (#State# s, MutableArray# s a#)
I should then be able to use writeArray#
, provided that (a) everything I
write into the MutableArray#
is in the same compact region and (b) I
evaluate everything to WHNF with seq
before writing it to the array. I think this would be safe. I do have one concern based on the
stg_unsafeThawArrayzh commentary:
A MUT_ARR_PTRS lives on the mutable list, but a MUT_ARR_PTRS_FROZEN normally doesn't...
I don't understand the internals of GHC well, but here is my understanding of the comment:
There is something called the mutable list, which has a bunch of mutable arrays and
is occassionally scan by the GC. For my purposes, this is problematic, since it means
that unsafeThawArray#
will cause the GC to start scanning things in a compact region.
That's not good. But, maybe my understanding is wrong, which would be great.
If unsafeThawArray#
cannot do what I need, I was thinking that unsafeCoerce#
would
do it, but I again I'd like to hear from someone knowledgeable on this subject. Thanks and let me know if I can clarify anything.
EDIT: Just a comment for future readers. After thinking about this more, I realized that using SmallArray#
instead of Array#
would be better. It should make writes faster. Compare the code for writing to MutableArray# with the code for writing to SmallMutableArray#. The card table that MutableArray#
keeps updated is worthless when everything is on the compact heap since it's never going to get scanned anyway.
This is an interesting idea and while it's a all quite subtle, I believe you can get away with it.
As you point out, the garbage collector needs to track which arrays are mutable as they may contain references to objects residing in younger generations than that of the array itself. Objects on this list will serve as roots when collecting younger generations.
To see why, imagine that you allocate (in the nursery, generation 0) a mutable array with some pointers to other newly-allocated heap objects. On garbage collection, both the array and its contents will be moved to generation 1, meaning that it won't need to be traversed when we scavenge for generation 0.
Now imagine that we allocate a new heap object in generation 0 and mutate an array element to reference it. Now consider what happens when we garbage collect the nursery: if we look only at references within generation 0, we will mistakenly conclude that our newly allocated object is dead. This is obviously wrong: our new object is live via its reference from the array in generation 1.
For this reason, we must maintain a mutable object list for each generation, which we will scavenge as roots when we collect younger generations.
However, as your intuition suggested, this should be unnecessary in a compact region as we shouldn't be scavenging the region's contents anyways. Moreover, since the heap representation of MutableArray#
and Array#
are identical (other than the info table pointer), your use of unsafeCoerce#
should be fine.
All of this is naturally contingent on carefully preserving the closure invariant that compact regions obey. This requires that all computation eventually reduce to some sort of lookup in the region. Strictly speaking even this isn't quite enough as Haskell's semantics don't guarantee that a copy won't be produced, but in the case of GHC Haskell this shouldn't be a problem.
Here's a small playground that I used to test this.
{-# LANGUAGE MagicHash #-}
{-# LANGUAGE UnboxedTuples #-}
{-# LANGUAGE BangPatterns #-}
import GHC.IO
import GHC.Compact
import GHC.Exts hiding (toList)
data Wombat = Wombat String deriving Show
n = 10000
wombats :: [Wombat]
wombats = take n $ cycle $ map Wombat [ "harry", "jerry", "carry", "larry", "fred" ]
main :: IO ()
main = do
c <- fromListM (length wombats) wombats >>= compact
let arr = getCompact c
flip mapM_ [1..n-1] $ \i -> do
swapMutArray i ((i+2) `mod` n) arr
mapM_ print (toList arr)
return ()
(!) :: MutArray a -> Int -> a
(!) (MutArray a) (I# n) =
case indexArray# a n of
(# x #) -> x
data MutArray a = MutArray (Array# a)
getMutArray :: MutArray a -> MutableArray# s a
getMutArray (MutArray arr) = unsafeCoerce# arr
fromListM :: Int -> [a] -> IO (MutArray a)
fromListM = \n@(I# n#) xs ->
IO $ \s0 -> case newArray# n# x0 s0 of
(# s1, arr #) -> unIO (go arr (n-1) xs) s1
where
x0 = error "hi"
go arr (-1) [] = IO $ \s0 -> case unsafeFreezeArray# arr s0 of
(# s1, arr' #) -> (# s1, MutArray arr' #)
go arr n@(I# n#) (x:xs) = do
IO $ \s0 -> case writeArray# arr n# x s0 of s1 -> (# s1, () #)
go arr (n-1) xs
go _ _ _ = error "uh oh"
toList :: Show a => MutArray a -> [a]
toList arr@(MutArray arr#) = go 0
where
!len = I# (sizeofArray# arr#)
go n
| n == len = []
| otherwise = (arr ! n) : go (n + 1)
writeMutArray :: Int -> a -> MutArray a -> IO ()
writeMutArray (I# i) x arr =
IO $ \s0 -> case writeArray# (getMutArray arr) i x s0 of s1 -> (# s1, () #)
swapMutArray :: Int -> Int -> MutArray a -> IO ()
swapMutArray m@(I# m#) n@(I# n#) arr = do
!x <- IO $ readArray# (getMutArray arr) m#
!y <- IO $ readArray# (getMutArray arr) n#
writeMutArray n x arr
writeMutArray m y arr
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With