Is the following correct?
(And is the disruptor pattern still so much faster than other lockless multi-producer multi-consumer queues (e.g. from boost) when multiple producers (i.e. CAS operations) are involved?)
My situation in detail:
Processing an entry can produce several new entries, which must be processed eventually, too. Performance has highest priority, entries being processed in FIFO order has second priority.
In the current implementation, each thread uses a local FIFO, where it adds its new entries. Idle threads steal work from other thread's local FIFO. Dependencies between the thread's processing are resolved using a lockless, mechanically sympathetic hash table (CASs on write, with bucket granularity). This results in pretty low contention but FIFO order is sometimes broken.
Using the disruptor pattern would guarantee FIFO order. But wouldn't distributing the entries onto the threads cause much higher contention (e.g. CAS on a read cursor) than for local FIFOs with work stealing (each thread's throughput is about the same)?
References I've found
The performance tests in the standard technical paper on the disruptor (Chapter 5 + 6) do not cover disjoint work distribution.
https://groups.google.com/forum/?fromgroups=#!topic/lmax-disruptor/tt3wQthBYd0 is the only reference I've found on disruptor + work stealing. It states that a queue per thread is dramatically slower if there is any shared state, but does not go into detail or explain why. I doubt that this sentence applies to my situation with:
Update - Bottom line up front for max performance: You need to write both in the idiomatic syntax for disruptor and work stealing, and then benchmark.
To me, I think the distinction is primarily in the split between message vs task focus, and therefore in the way you want to think of the problem. Try to solve your problem, and if it is task-focused then Disruptor is a good fit. If the problem is message focused, then you might be more suited to another technique such as work stealing.
Use work stealing when your implementation is message focused. Each thread can pick up a message and run it through to completion. For an example HTTP server - Each inbound http request is allocated a thread. That thread is focused on handling the request start to finish - logging the request, checking security controls, doing vhost lookup, fetching file, sending response, and closing connection
Use disruptor when your implementation is task focused. Each thread can work on a particular stage of the processing. Alternative example: for a task focus, the processing would be split into stages, so you would have a thread that does logging, a thread for security controls, a thread for vhost lookup, etc; each thread focused on its task and passes the request to the next thread in the pipeline. Stages may be parallelised but the overall structure is a thread focused on a specific task and hands the message along between threads.
Of course, you can change your implementation to suit each approach better.
In your case, I would structure the problem differently if you wanted to use Disruptor. Generally you would eliminate shared state by having a single thread own the state and pass all tasks through that thread of work - look up SEDA for lots of diagrams like this. This can have lots of benefits, but again, is really down to your implementation.
Some more verbosity:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With