Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

GHC per thread GC strategy

I have a Scotty api server which constructs an Elasticsearch query, fetches results from ES and renders the json.

In comparison to other servers like Phoenix and Gin, I'm getting higher CPU utilization and throughput for serving ES responses by using BloodHound but Gin and Phoenix were magnitudes better than Scotty in memory efficiency.

Stats for Scotty

 wrk -t30 -c100 -d30s "http://localhost:3000/filters?apid=1&hfa=true"
Running 30s test @ http://localhost:3000/filters?apid=1&hfa=true
  30 threads and 100 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   192.04ms  305.45ms   1.95s    83.06%
    Req/Sec   133.42    118.21     1.37k    75.54%
  68669 requests in 30.10s, 19.97MB read
Requests/sec:   2281.51
Transfer/sec:    679.28KB

These stats are on my Mac having GHC 7.10.1 installed

Processor info 2.5GHx i5
Memory info 8GB 1600 Mhz DDR3

I am quite impressed by lightweight thread based concurrency of GHC but memory efficiency remains a big concern.

Profiling memory usage yielded me the following stats

    39,222,354,072 bytes allocated in the heap
     277,239,312 bytes copied during GC
     522,218,848 bytes maximum residency (14 sample(s))
         761,408 bytes maximum slop
            1124 MB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0       373 colls,   373 par    2.802s   0.978s     0.0026s    0.0150s
  Gen  1        14 colls,    13 par    0.534s   0.166s     0.0119s    0.0253s

  Parallel GC work balance: 42.38% (serial 0%, perfect 100%)

  TASKS: 18 (1 bound, 17 peak workers (17 total), using -N4)

  SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)

  INIT    time    0.001s  (  0.008s elapsed)
  MUT     time   31.425s  ( 36.161s elapsed)
  GC      time    3.337s  (  1.144s elapsed)
  EXIT    time    0.000s  (  0.001s elapsed)
  Total   time   34.765s  ( 37.314s elapsed)

  Alloc rate    1,248,117,604 bytes per MUT second

  Productivity  90.4% of total user, 84.2% of total elapsed

gc_alloc_block_sync: 27215
whitehole_spin: 0
gen[0].sync: 8919
gen[1].sync: 30902

Phoenix never took more than 150 MB, while Gin took much lower memory.

I believe that GHC uses mark and sweep strategy for GC. I also believe it would have been better to use per thread incremental GC strategy akin to Erlang VM for better memory efficiency.

And by interpreting Don Stewart's answer to a related question there must be some way to change the GC strategy in GHC.

I also noted that the memory usage remained stable and pretty low when the concurrency level was low, so I think memory usage booms up only when concurrency is pretty high.

Any ideas/pointers to solve this issue.

like image 287
user2512324 Avatar asked May 31 '15 10:05

user2512324


1 Answers

http://community.haskell.org/~simonmar/papers/local-gc.pdf

This paper by Simon Marlow describes per-thread local heaps, and claims that this was implemented in GHC. It's dated 2011. I can't be sure if this is what the current version of GHC actually does (i.e., did this go into the release version of GHC, is it still the current status quo, etc.), but it seems my recollection wasn't completely made up.

I will also point out the section of the GHC manual that explains the settings you can twiddle to adjust the garbage collector:

https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/runtime-control.html#rts-options-gc

In particular, by default GHC uses a 2-space collector, but adding the -c RTS option makes it use a slightly slower 1-space collector, which should eat less RAM. (I'm entirely unclear which generation(s) this information applies to.)

I get the impression Simon Marlow is the guy who does most of the RTS stuff (including the garbage collector), so if you can find him on IRC, he's the guy to ask if you want the direct truth...

like image 163
MathematicalOrchid Avatar answered Oct 04 '22 02:10

MathematicalOrchid