In Haskell, I created a Vector of 1000000 IntMaps. I then used Gloss to render a picture in a way that accesses random intmaps from that vector.
That is, I had keep every single one of them in memory. The rendering function itself is very lightweight, so the performance was supposed to be good.
Yet, the program was running at 4fps. Upon profiling, I noticed 95% of the time was spent on GC. Fair enough:
The GC is crazily scanning my vector, even though it never changes.
Is there any way to tell GHC "this big value is needed and will not change - do not try to collect anything inside it".
Edit: the program below is sufficient to replicate the issue.
import qualified Data.IntMap as Map
import qualified Data.Vector as Vec
import Graphics.Gloss
import Graphics.Gloss.Interface.IO.Animate
import System.Random
main = do
let size = 10000000
let gen i = Map.fromList $ zip [mod i 10..0] [0..mod i 10]
let vec = Vec.fromList $ map gen [0..size]
let draw t = do
rnd <- randomIO :: IO Int
let empty = Map.null $ vec Vec.! mod rnd size
let rad = if empty then 10 else 50
return $ translate (20 * cos t) (20 * sin t) (circle rad)
animateIO (InWindow "hi" (256,256) (1,1)) white draw
This accesses a random map on a huge vector and draws a rotating circle whose radius depend on whether the map is empty.
Despite that logic being very simple, the program struggles at around 1 FPS here.
gloss is the culprit here.
First, a little background on GHC's garbage collector. GHC uses (by default) a generational, copying garbage collector. This means that the heap consists of several memory areas called generations. Objects are allocated into the youngest generation. When a generation becomes full, it is scanned for live objects and the live objects are copied into the next older generation, and then the generation that was scanned is marked as empty. When the oldest generation becomes full, the live objects are instead copied into a new version of the oldest generation.
An important fact to take away from this is that the GC only ever examines live objects. Dead objects are never touched at all. This is great when collecting generations that are mostly garbage, as often happens in the youngest generation. It's not good if long-lived data undergoes many GCs, as it will be copied repeatedly. (It can also be counterintuitive to those used to malloc/free-style memory management, where allocation and deallocation are both quite expensive, but leaving objects allocated for a long time has no direct cost.)
Now, the "generational hypothesis" is that most objects are either short-lived or long-lived. The long-lived objects will quickly end up in the oldest generation since they are alive at every collection. Meanwhile, most of the short-lived objects that are allocated will never survive the youngest generation; only those that happen to be alive when it is collected will be promoted to the next generation. Similarly, most of those short-lived objects that do get promoted will not survive to the third generation. As a result, the oldest generation that holds the long-lived objects should fill up very slowly, and its expensive collections that have to copy all the long-lived objects should occur rarely.
Now, all of this is actually true in your program, except for one problem:
let displayFun backendRef = do
-- extract the current time from the state
timeS <- animateSR `getsIORef` AN.stateAnimateTime
-- call the user action to get the animation frame
picture <- frameOp (double2Float timeS)
renderS <- readIORef renderSR
portS <- viewStateViewPort <$> readIORef viewSR
windowSize <- getWindowDimensions backendRef
-- render the frame
displayPicture
windowSize
backColor
renderS
(viewPortScale portS)
(applyViewPortToPicture portS picture)
-- perform GC every frame to try and avoid long pauses
performGC
gloss tells the GC to collect the oldest generation every frame!
This might be a good idea if those collections are expected to take less time than the delay between frames anyways, but it's clearly not a good idea for your program. If you remove that performGC
call from gloss, then your program runs quite quickly. Presumably if you let it run for long enough, then the oldest generation will eventually fill up and you might get a delay of a few tenths of a second as the GC copies all your long-lived data, but that's much better than paying that cost every frame.
All that said, there is a ticket #9052 about adding a stable generation, which would also suit your needs nicely. See there for more details.
I would try compiling -with-rtsopts
and then playing with the heap (-H
) and/or allocator (-A
) options. Those greatly influence how the GC works.
More info here: https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/runtime-control.html
To add to Reid's answer, I've found performMinorGC
(added in https://ghc.haskell.org/trac/ghc/ticket/8257) to be the best of both worlds here.
Without any explicit GC scheduling, I still get frequent collection-related frame drops as the nursery becomes exhausted. But performGC
indeed becomes performance-killing as soon as there is any significant long-lived memory usage.
performMinorGC
does what we want, ignoring long-term memory and cleaning up the garbage from each frame predictably - especially if you tune -H
and -A
to ensure that the per-frame garbage fits in the nursery.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With