Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Single-threaded program profiles 15% of runtime in semaphore_wait_trap

Tags:

profiling

mono

f#

On Mac OS using mono, if I compile and profile the program below, I get the following results:

% fsharpc --nologo -g foo.fs -o foo.exe
% mono --profile=default:stat foo.exe
...
Statistical samples summary
    Sample type: cycles
    Unmanaged hits:     336 (49.1%)
    Managed hits:       349 (50.9%)
    Unresolved hits:      1 ( 0.1%)
  Hits      % Method name
   154  22.48 Microsoft.FSharp.Collections.SetTreeModule:height ...
   105  15.33 semaphore_wait_trap
    74  10.80 Microsoft.FSharp.Collections.SetTreeModule:add ...
...

Note the second entry, semaphore_wait_trap. Here is the program:

[<EntryPoint>]
let main args = 
    let s = seq { 1..1000000 } |> Set.ofSeq
    s |> Seq.iter (fun _ -> ())
    0

I looked in the source for the Set module, but I didn't find any (obvious) locking.

Is my single-threaded program really spending 15% of its execution time messing with semaphores? If it is, can I make it not do that and get a performance boost?

like image 491
Søren Debois Avatar asked Apr 11 '14 20:04

Søren Debois


1 Answers

According to Instruments, it's sgen/gc calling semaphore_wait_trap:

enter image description here

Sgen is documented as stopping all other threads while it collects:

Before doing a collection (minor or major), the collector must stop all running threads so that it can have a stable view of the current state of the heap, without the other threads changing it

In other words, when the code is trying to allocate memory and a GC is required, the time it takes shows up under semaphore_wait_trap since that's your application thread. I suspect the mono profiler doesn't profile the gc thread itself so you don't see the time in the collection code.

The germane output then is really the GC summary:

GC summary
    GC resizes: 0
    Max heap size: 0
    Object moves: 1002691
    Gen0 collections: 123, max time: 14187us, total time: 354803us, average: 2884us
    Gen1 collections: 3, max time: 41336us, total time: 60281us, average: 20093us

If you want your code to run faster, don't collect as often.

Understanding the actual cost of collection can be done through dtrace since sgen has dtrace probes.

like image 100
Tony Lee Avatar answered Oct 17 '22 10:10

Tony Lee