Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MailboxProcessor performance problems

I have been trying to design a system which allows a large amount of concurrent users to be represented in memory at the same time. When setting out to design this sytem I immediately thought of some sort of actor based solution a kin to Erlang.

The system has to be done in .NET, so I started working on a prototype in F# using MailboxProcessor but have run into serious performance problems with them. My initial idea was to use one actor (MailboxProcessor) per user to serialize communication the communication for one user.

I have isolated a small piece of code that reproduces the problem I am seeing:

open System.Threading;
open System.Diagnostics;

type Inc() =

    let mutable n = 0;
    let sw = new Stopwatch()

    member x.Start() =
        sw.Start()

    member x.Increment() =
        if Interlocked.Increment(&n) >= 100000 then
            printf "UpdateName Time %A" sw.ElapsedMilliseconds

type Message
    = UpdateName of int * string

type User = {
    Id : int
    Name : string
}

[<EntryPoint>]
let main argv = 

    let sw = Stopwatch.StartNew()
    let incr = new Inc()
    let mb = 

        Seq.initInfinite(fun id -> 
            MailboxProcessor<Message>.Start(fun inbox -> 

                let rec loop user =
                    async {
                        let! m = inbox.Receive()

                        match m with
                        | UpdateName(id, newName) ->
                            let user = {user with Name = newName};
                            incr.Increment()
                            do! loop user
                    }

                loop {Id = id; Name = sprintf "User%i" id}
            )
        ) 
        |> Seq.take 100000
        |> Array.ofSeq

    printf "Create Time %i\n" sw.ElapsedMilliseconds
    incr.Start()

    for i in 0 .. 99999 do
        mb.[i % mb.Length].Post(UpdateName(i, sprintf "User%i-UpdateName" i));

    System.Console.ReadLine() |> ignore

    0

Just creating the 100k actors take around 800ms on my quad core i7. Then submitting the UpdateName message to each one of the actor and wait for them to complete takes about 1.8 seconds.

Now, I realize there is overhead from all the queue:ing on the ThreadPool, setting/resetting AutoResetEvents, etc internally in the MailboxProcessor. But is this really the expected performance? From reading both MSDN and various blogs on the MailboxProcessor I have gotten the idea that it's to be a kin to erlang actors, but from the abyssmal performance I am seeing this doesn't seem to hold true in reality?

I also tried a modified version of the code, which uses 8 MailboxProcessors and each one of them hold a Map<int, User> map which is used to lookup a user by id, it yielded some improvements bringing down the total time for the UpdateName operation to 1.2 seconds. But it still feels very slow, the modified code is here:

open System.Threading;
open System.Diagnostics;

type Inc() =

    let mutable n = 0;
    let sw = new Stopwatch()

    member x.Start() =
        sw.Start()

    member x.Increment() =
        if Interlocked.Increment(&n) >= 100000 then
            printf "UpdateName Time %A" sw.ElapsedMilliseconds

type Message
    = CreateUser of int * string
    | UpdateName of int * string

type User = {
    Id : int
    Name : string
}

[<EntryPoint>]
let main argv = 

    let sw = Stopwatch.StartNew()
    let incr = new Inc()
    let mb = 

        Seq.initInfinite(fun id -> 
            MailboxProcessor<Message>.Start(fun inbox -> 

                let rec loop users =
                    async {
                        let! m = inbox.Receive()

                        match m with
                        | CreateUser(id, name) ->
                            do! loop (Map.add id {Id=id; Name=name} users)

                        | UpdateName(id, newName) ->
                            match Map.tryFind id users with
                            | None -> 
                                do! loop users

                            | Some(user) ->
                                incr.Increment()
                                do! loop (Map.add id {user with Name = newName} users)
                    }

                loop Map.empty
            )
        ) 
        |> Seq.take 8
        |> Array.ofSeq

    printf "Create Time %i\n" sw.ElapsedMilliseconds

    for i in 0 .. 99999 do
        mb.[i % mb.Length].Post(CreateUser(i, sprintf "User%i-UpdateName" i));

    incr.Start()

    for i in 0 .. 99999 do
        mb.[i % mb.Length].Post(UpdateName(i, sprintf "User%i-UpdateName" i));

    System.Console.ReadLine() |> ignore

    0

So my question is here, am I doing something wrong? Have I missunderstood how the MailboxProcessor is supposed to be used? Or is this performance what is expected.

Update:

So I got a hold of some guys on ##fsharp @ irc.freenode.net, which informed me that using sprintf is very slow, and as it turns out that is where a large part of my performance problems were comming from. But, removing the sprintf operations above and just using the same name for every User, I still end up with about 400ms for doign the operations, which feels really slow.

like image 759
thr Avatar asked Jun 28 '13 08:06

thr


1 Answers

Now, I realize there is overhead from all the queue:ing on the ThreadPool, setting/resetting AutoResetEvents, etc internally in the MailboxProcessor.

And printf, Map, Seq and contending for your global mutable Inc. And you're leaking heap-allocated stack frames. In fact, only a small proportion of the time taken to run your benchmark has anything to do with MailboxProcessor.

But is this really the expected performance?

I am not surprised by the performance of your program but it does not say much about the performance of MailboxProcessor.

From reading both MSDN and various blogs on the MailboxProcessor I have gotten the idea that it's to be a kin to erlang actors, but from the abyssmal performance I am seeing this doesn't seem to hold true in reality?

The MailboxProcessor is conceptually somewhat similar to part of Erlang. The abysmal performance you're seeing is due to a variety of things, some of which are quite subtle and will affect any such program.

So my question is here, am I doing something wrong?

I think you're doing a few things wrong. Firstly, the problem you are trying to solve is not clear so this sounds like an XY problem question. Secondly, you're trying to benchmark the wrong things (e.g. you are complaining about microsecond times required to create a MailboxProcessor but may intend to do so only when a TCP connection is established which takes several orders of magnitude longer). Thirdly, you have written a benchmark program that measures the performance of some things but have attributed your observations to completely different things.

Let's look at your benchmark program in more detail. Before we do anything else, let's fix some bugs. You should always use sw.Elapsed.TotalSeconds to measure time because it is more precise. You should always recur in an async workflow using return! and not do! or you will leak stack frames.

My initial timings are:

Creation stage: 0.858s
Post stage: 1.18s

Next, let's run a profile to make sure our program really is spending most of its time thrashing the F# MailboxProcessor:

77%    Microsoft.FSharp.Core.PrintfImpl.gprintf(...)
 4.4%  Microsoft.FSharp.Control.MailboxProcessor`1.Post(!0)

Clearly not what we'd hoped. Thinking more abstractly, we are generating lots of data using things like sprintf and then applying it but we're doing the generation and application together. Let's separate out our initialization code:

let ids = Array.init 100000 (fun id -> {Id = id; Name = sprintf "User%i" id})
...
    ids
    |> Array.map (fun id ->
        MailboxProcessor<Message>.Start(fun inbox -> 
...
            loop id
...
    printf "Create Time %fs\n" sw.Elapsed.TotalSeconds
    let fxs =
      [|for i in 0 .. 99999 ->
          mb.[i % mb.Length].Post, UpdateName(i, sprintf "User%i-UpdateName" i)|]
    incr.Start()
    for f, x in fxs do
      f x
...

Now we get:

Creation stage: 0.538s
Post stage: 0.265s

So creation is 60% faster and posting is 4.5x faster.

Let's try completely rewriting your benchmark:

do
  for nAgents in [1; 10; 100; 1000; 10000; 100000] do
    let timer = System.Diagnostics.Stopwatch.StartNew()
    use barrier = new System.Threading.Barrier(2)
    let nMsgs = 1000000 / nAgents
    let nAgentsFinished = ref 0
    let makeAgent _ =
      new MailboxProcessor<_>(fun inbox ->
        let rec loop n =
          async { let! () = inbox.Receive()
                  let n = n+1
                  if n=nMsgs then
                    let n = System.Threading.Interlocked.Increment nAgentsFinished
                    if n = nAgents then
                      barrier.SignalAndWait()
                  else
                    return! loop n }
        loop 0)
    let agents = Array.init nAgents makeAgent
    for agent in agents do
      agent.Start()
    printfn "%fs to create %d agents" timer.Elapsed.TotalSeconds nAgents
    timer.Restart()
    for _ in 1..nMsgs do
      for agent in agents do
        agent.Post()
    barrier.SignalAndWait()
    printfn "%fs to post %d msgs" timer.Elapsed.TotalSeconds (nMsgs * nAgents)
    timer.Restart()
    for agent in agents do
      use agent = agent
      ()
    printfn "%fs to dispose of %d agents\n" timer.Elapsed.TotalSeconds nAgents

This version expects nMsgs to each agent before that agent will increment the shared counter, greatly reducing the performance impact of that shared counter. This program also examines performance with different numbers of agents. On this machine I get:

Agents  M msgs/s
     1    2.24
    10    6.67
   100    7.58
  1000    5.15
 10000    1.15
100000    0.36

So it appears that part of the reasons for the lower msgs/s speed you are seeing is the unusually-large number (100,000) of agents. With 10-1,000 agents the F# implementation is over 10x faster than it is with 100,000 agents.

So if you can make do with this kind of performance then you should be able to write your entire application in F# but if you need to eek out much more performance I would recommend using a different approach. You may not even have to sacrifice using F# (and you can certainly use it for prototyping) by adopting a design like the Disruptor. In practice, I have found that the time spent doing serialization on .NET tends to be much larger than the time spent in F# async and MailboxProcessor.

like image 197
J D Avatar answered Sep 29 '22 10:09

J D