I do know how to create my own ExecutionContext or to import the play framework global one. But I must admit I am far from being an expert on how multiple context/executionServices would work in the back.
So my question is, for better performance/behaviour of my service which ExecutionContext should I use?
I tested two options:
import play.api.libs.concurrent.Execution.defaultContext
and
implicit val executionContext = ExecutionContext.fromExecutorService(Executors.newFixedThreadPool(Runtime.getRuntime().availableProcessors()))
With both resulting in comparable performances.
The action I use is implemented like this in playframework 2.1.x. SedisPool is my own object with extra Future wrapping of a normal sedis/jedis client pool.
def testaction(application: String, platform: String) = Action {
Async(
SedisPool.withAsyncClient[Result] { client =>
client.get(StringBuilder.newBuilder.append(application).append('-').append(platform).toString) match {
case Some(x) => Ok(x)
case None => Results.NoContent
}
})
}
This performance-wize behave as good or slightly slower than the exact same function in Node.js, and Go. But still slower than Pypy. But way faster than the same thing in Java (using blocking call to redis using jedis in this case). We load tested with gatling. We were doing a "competition" of techs for simple services on top of redis and the criteria was "with the same amount of efforts from coders". I already tested this using fyrie (and apart from the fact that I do not like the API) it behaved almost the same as this Sedis implementation.
But that's beside my question. I just want to learn more about this part of playframework/scala.
Is there an advised behaviour? Or could someone point me in a better direction? I am starting using scala now, I am far from an expert but I can walk myself through code answers.
Thanks for any help.
After tampering with the number of threads in the pool I found out that: Runtime.getRuntime().availableProcessors() * 20
Gives around 15% to 20% performance boost to my service (measured in request per seconds, and by average response time), which actually makes it slightly better than node.js and go (barely though). So I now have more questions : - I tested 15x and 25x and 20 seems to be a sweet spot. Why? Any ideas? - Would there be other settings that might be better? Other "sweet spots"? - Is 20x the sweet spot or is this dependent on other parameters of the machine/jvm I am running on?
Found more information on the play framework docs. http://www.playframework.com/documentation/2.1.0/ThreadPools
For IO they do advise something to what I've done but gives a way to do it through Akka.dispatchers that are configurable through *.conf files (this should make my ops happy).
So now I am using
implicit val redis_lookup_context: ExecutionContext = Akka.system.dispatchers.lookup("simple-redis-lookup")
with the dispatcher configured by
akka{
event-handlers = ["akka.event.slf4j.Slf4jEventHandler"]
loglevel = WARNING
actor {
simple-redis-lookup = {
fork-join-executor {
parallelism-factor = 20.0
#parallelism-min = 40
#parallelism-max = 400
}
}
}
}
It gave me around 5% boost (eyeballing it now), and more stability of the performance once the JVM was "hot". And my sysops are happy to play with those settings without rebuilding the service.
My questions are still there though. Why this numbers?
The way I think about optimization is to:
Single threaded optimization
The performance of a single thread will typically be gated on a single component or section of your code, and it might be:
However, latencies in the single thread are not so worrisome if you can run multiple threads. While one thread is blocked, another can use the CPU (for the overhead of swapping out context and replacing most of the items in the CPU cache). So how many threads should you run?
Multi-threading
Let's assume that the thread spends about 50% of the time on the CPU and 50% waiting for IO. In that case, each CPU can be fully utilized by 2 threads, and you see a 2x throughput improvement. If the thread spends about 1% of the time using CPU, you should (all things being equal) be able to run 100 threads concurrently.
However, this is where a lot of weird effects can occur:
n
x, you will never quite get n
x throughput improvement. And after a critical point, as you increase n
, so you performance will decrease.If this happens, then you need to either rethink you algorithm, change the server, network or network services or decrease parallelism.
Factors that affect how many threads you can run
From the above, you can see that there a metric ton of factors involved. As a result, the sweet spot of threads/core is an accident of multiple causes, including:
From experience, there is no magic formula to compute a priori the best number of threads. This problem is best tackled empirically (as I show above), just as you have done. If you need to generalize, you will need sampling of performance over different CPU architectures, memory and networks on the operating system of your choice.
Several easily observed metrics are useful here:
If you need to optimize, get the best profiling tools you can. You would need a specific tool for monitoring the operating system (eg DTrace for Solaris), and one for the JVM (I personally love JProfiler). These tools will allow you to zoom in on precisely the areas I describe above.
Conclusions
It happens that your particular code, on the particular Scala library version, JVM version, OS, server and Redis server, run so that each thread is waiting for I/O about 95% of the time. (If running single threaded, you'd find the CPU load to be about 5%).
This allows about 20 threads to share each CPU optimally in this configuration.
This is the sweet spot because:
Have you tried changing your thread pool:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With