How do I optimise garbage collection for a soft real-time application in Haskell?

Tags:

I've written a soft real-time application in Haskell, which deals with simulated physics, collision detection, all that good stuff. In doing all that I allocate a lot of memory, and I could probably optimise my memory usage if I wanted to, but since I'm sitting nicely at 40% CPU and only 1% RAM used anyway, that doesn't seem necessary. What I am seeing though, is that a lot of time, when the garbage collector kicks in, frames are skipped. I've verified that this is the cause of the problem by profiling with threadscope: no useful computation happens for sometimes up to 0.05 seconds while the garbage collector does its business, resulting in up to 3 skipped frames, which is very noticeable and very annoying.

Now, I tried solving this by manually calling performMinorGC every frame, and this seems to alleviate the issue, making it much smoother, apart from the fact that overall CPU usage goes drastically up to around 70%. Clearly I'd rather avoid this.

Another thing I tried was decreasing the GC's allocation space down to 64k from 512k with -H64k, and I also tried setting -I0.03 to try to get it to collect more often. Both of these options changed the pattern of garbage collection that I saw in threadscope, but they still resulted in skipped frames.

Can anyone with some experience with GC optimisation help me out here? Am I doomed to manually calling performMinorGC and putting up with the massive performance loss resulting?

EDIT

I tried to run it for a similar amount of time in these tests, but since it's real-time there's no point at which it's 'done'.

Runtime statistics with performMinorGC every 4 frames:

     9,776,109,768 bytes allocated in the heap
     349,349,800 bytes copied during GC
      53,547,152 bytes maximum residency (14 sample(s))
      12,123,104 bytes maximum slop
             105 MB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0     15536 colls, 15536 par    3.033s   0.997s     0.0001s    0.0192s
  Gen  1        14 colls,    13 par    0.207s   0.128s     0.0092s    0.0305s

  Parallel GC work balance: 6.15% (serial 0%, perfect 100%)

  TASKS: 20 (2 bound, 13 peak workers (18 total), using -N4)

  SPARKS: 74772 (20785 converted, 0 overflowed, 0 dud, 38422 GC'd, 15565 fizzled)

  INIT    time    0.000s  (  0.001s elapsed)
  MUT     time    9.773s  (  7.368s elapsed)
  GC      time    3.240s  (  1.126s elapsed)
  EXIT    time    0.003s  (  0.004s elapsed)
  Total   time   13.040s  (  8.499s elapsed)

  Alloc rate    1,000,283,400 bytes per MUT second

  Productivity  75.2% of total user, 115.3% of total elapsed

gc_alloc_block_sync: 29843
whitehole_spin: 0
gen[0].sync: 11
gen[1].sync: 71

With no performMinorGC

  12,316,488,144 bytes allocated in the heap
     447,495,936 bytes copied during GC
      63,556,272 bytes maximum residency (15 sample(s))
      15,418,296 bytes maximum slop
             146 MB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0     19292 colls, 19292 par    2.613s   0.950s     0.0000s    0.0161s
  Gen  1        15 colls,    14 par    0.237s   0.165s     0.0110s    0.0499s

  Parallel GC work balance: 2.67% (serial 0%, perfect 100%)

  TASKS: 17 (2 bound, 13 peak workers (15 total), using -N4)

  SPARKS: 100714 (29688 converted, 0 overflowed, 0 dud, 47577 GC'd, 23449 fizzled)

  INIT    time    0.000s  (  0.001s elapsed)
  MUT     time   13.377s  (  9.917s elapsed)
  GC      time    2.850s  (  1.115s elapsed)
  EXIT    time    0.000s  (  0.006s elapsed)
  Total   time   16.247s  ( 11.039s elapsed)

  Alloc rate    920,744,995 bytes per MUT second

  Productivity  82.5% of total user, 121.4% of total elapsed

gc_alloc_block_sync: 68533
whitehole_spin: 0
gen[0].sync: 9
gen[1].sync: 147

The overall productivity seems to be lower for no performMinorGC now than when I tested it yesterday for some reason -- before it was always >90%.

738

asked Nov 05 '15 13:11

Bradley Hardy

1 Answers

You have vary large old generation. It is like 100Mb large.

By default GHC performs major GC when heap size reaches 2x of its size after the last major GC. It means that at some point GC had to scan and copy 50Mb of data. If your processor has 10Gb memory throughput limit, then loading and copying 50Mb will take at least 0.01sec (compare with gen1 average and max pause.)

(I'm assuming that you checked eventlog to ensure major GC actually working during 0.05sec pause. So it is not an issue with thread synchronization, when GC is waiting for other threads instead of doing real work.)

So to minimize GC pauses, you should ensure old generation is small. If most of this 50Mb is a static data allocated at the very beginning and live till the end (e.g. textures or meshes,) then you are stuck. The only workaround I know is to pack the data into e.g. storable vector and unpack parts of it again when you need it.

If the data is allocated during execution and lives limited amount of time (but enough to survive few major generations,) then try to rethink your pipeline. Usually no data should survive one frame, so your are doing something wrong. E.g. you retain data when you should not.

Other bad sign -- gen0 max pause 0.02sec. It is pretty strange. By default gen0 allocation area is 0.5Mb, so gen0 GC should be fast. Probably you have large remembered set. Possible cause: mutable structures (IORef, mutable Vector etc) or a lot of lazy thunk updates.

And a minor (probably unrelated) issue: looks like you are using implicit parallelism, but only 1/3 of sparks are converted. You are allocating too many sparts, 1/2 of them are GC'd.

answered Sep 27 '22 19:09

Yuras

Related questions
                            
                                Literate Haskell: References And Indexing
                            
                                How do I compile and run third party code snippet in Haskell securely?
                            
                                Turning a Dict into a constraint
                            
                                Unexpectedly low throughput for Network I/O using Scotty
                            
                                How do I convert an inductive type into a coinductive type efficiently (without recursion)?
                            
                                Informing Haskell that `(Reverse (Reverse xs)) ~ xs`
                            
                                Idiomatic Usage of Haskell Foreign Types?
                            
                                Can anyone suggest a good constraint library for Haskell?
                            
                                Haskell RSS, mail and XML library choices
                            
                                How do you create a rewrite pass based on whether two expressions refers to the same bound name?
                            
                                Most efficient way to seek around in a large file
                            
                                Haskell Android OpenGL
                            
                                Haskell C FFI: accessing static data structures
                            
                                How can I have a vector that's strict in its values, like a normal type with bangs (!)?
                            
                                Why is my genetic algorithm seemingly behaving randomly?
                            
                                Allowing cross-origin requests in Yesod
                            
                                Check memory usage in haskell
                            
                                “Strictness” of square brackets

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How do I optimise garbage collection for a soft real-time application in Haskell?

Tags:

optimization

garbage-collection

haskell

Bradley Hardy

People also ask

1 Answers

Yuras

Recent Activity

Donate For Us