My stream has an even mix of CPU bound and IO bound stages (every IO stage is followed by a CPU stage). What I want to do is put the IO operations on a different dispatcher than the rest of the stream.
In a traditional actor based Akka application I could have put my IO actors on a fixed thread pool dispatcher with a lot of threads while putting the CPU bound actors on a fork join pool with a small number of threads (some multiple, ideally 1, of the number of cores). That should reduce time wasted in thread switching for the CPU bound actors while increasing throughput by having a lot of threads blocking on IO.
Is this understanding right? If not, why? If yes, then how do I put my IO bound stages (Flows) on a separate dispatcher from the rest of the stream?
I have tried turning off auto-fusing and that does help. But it still has a lot lesser throughput than the almost equivalent Akka counterpart.
The default is that all stages in a flow is run on the same actor, you can mark that stages should run on a separate dispatcher using attributes, like so:
stage.withAttributes(ActorAttributes.dispatcher("dispatcher-name"))
This will also introduce asynchronous boundaries around that stage, effectively running it in its own actor. To avoid having the asynchronous boundary become to costly the stage will now actually send demand for 16 elements at a time from upstream, so this is something you must be aware of.
The buffer size can be tweaked with an additional attribute, in this case making it behave like fused stages in that it asks for one element at a time, note that this can give too much overhead, depending on the use case.
stage.withAttributes(Attributes.inputBuffer(1, 1))
Relevant parts of the docs:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With