I have a program written in Scala. I wanted to measure the execution time of different independent code blocks. When I did it in the obvious way (i.e. inserting System.nanoTime()
before and after each block), I observed that the the execution time depends on the ordering of the blocks. The first some blocks always took more time than the others.
I created a minimalistic example that reproduces this behaviour. All code blocks are the same and call hashCode()
for an array of integers, for simplicity.
package experiments
import scala.util.Random
/**
* Measuring execution time of a code block
*
* Minimalistic example
*/
object CodeBlockMeasurement {
def main(args: Array[String]): Unit = {
val numRecords = args(0).toInt
// number of independent measurements
val iterations = args(1).toInt
// Changes results a little bit, but not too much
// val records2 = Array.fill[Int](1)(0)
// records2.foreach(x => {})
for (_ <- 1 to iterations) {
measure(numRecords)
}
}
def measure(numRecords: Int): Unit = {
// using a new array every time
val records = Array.fill[Int](numRecords)(new Random().nextInt())
// block of code to be measured
def doSomething(): Unit = {
records.foreach(k => k.hashCode())
}
// measure execution time of the code-block
elapsedTime(doSomething(), "HashCodeExperiment")
}
def elapsedTime(block: => Unit, name: String): Unit = {
val t0 = System.nanoTime()
val result = block
val t1 = System.nanoTime()
// print out elapsed time in milliseconds
println(s"$name took ${(t1 - t0).toDouble / 1000000} ms")
}
}
After running the program with numRecords = 100000
and iterations = 10
, my console looks like this:
HashCodeExperiment took 14.630283 ms
HashCodeExperiment took 7.125693 ms
HashCodeExperiment took 0.368151 ms
HashCodeExperiment took 0.431628 ms
HashCodeExperiment took 0.086455 ms
HashCodeExperiment took 0.056458 ms
HashCodeExperiment took 0.055138 ms
HashCodeExperiment took 0.062997 ms
HashCodeExperiment took 0.063736 ms
HashCodeExperiment took 0.056682 ms
Can somebody explain why is that? Shouldn't all be the same? Which is the real execution time?
Thanks a lot,
Peter
Environment parameters:
OS: ubuntu 14.04 LTS (64 bit)
IDE: IntelliJ IDEA 2016.1.1 (IU-145.597)
Scala: 2.11.7
It's Java's JIT kicking in. Initially the plain bytecode is executed but after some time (1.5k/10k invocations by default for Oracle JVM, see -XX:CompileThreshold
) the optimizations start processing the actual executed native code which usually results in quite drastic performance improvements.
As Ivan mentions, then there's caching of intermediate bytecode/native code and various other technologies involved, one of the most significant ones being the Garbage Collector itself which cause even more variance to individual results. Depending how heavily the code allocates new objects this might absolutely trash performance whenever GC occurs, but that's a separate issue.
To remove such outlier results when microbenchmarking it is recommendable that you benchmark multiple iterations of the action and discard the bottom and top 5..10% results and do the performance estimation based on remaining samples.
Short answer: caching.
These are independent code blocks, but runs can't be fully independent because they are run in same JVM instance, and in same process of same CPU. JVM itself has a lot of optimization inside, including caching. Modern CPU also do so. So, as result, it's quite common behavior, that re-run usually takes less time than first run.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With