Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why do the execution time of independent code blocks depend on execution order in Scala? [duplicate]

I have a program written in Scala. I wanted to measure the execution time of different independent code blocks. When I did it in the obvious way (i.e. inserting System.nanoTime() before and after each block), I observed that the the execution time depends on the ordering of the blocks. The first some blocks always took more time than the others.

I created a minimalistic example that reproduces this behaviour. All code blocks are the same and call hashCode() for an array of integers, for simplicity.

package experiments

import scala.util.Random

/**
  * Measuring execution time of a code block
  *
  * Minimalistic example
  */
object CodeBlockMeasurement {

  def main(args: Array[String]): Unit = {
    val numRecords = args(0).toInt
    // number of independent measurements
    val iterations = args(1).toInt

    // Changes results a little bit, but not too much
    // val records2 = Array.fill[Int](1)(0)
    // records2.foreach(x => {})

    for (_ <- 1 to iterations) {
      measure(numRecords)
    }
  }

  def measure(numRecords: Int): Unit = {
    // using a new array every time
    val records = Array.fill[Int](numRecords)(new Random().nextInt())
    // block of code to be measured
    def doSomething(): Unit = {
      records.foreach(k => k.hashCode())
    }
    // measure execution time of the code-block
    elapsedTime(doSomething(), "HashCodeExperiment")
  }

  def elapsedTime(block: => Unit, name: String): Unit = {
    val t0 = System.nanoTime()
    val result = block
    val t1 = System.nanoTime()
    // print out elapsed time in milliseconds
    println(s"$name took ${(t1 - t0).toDouble / 1000000} ms")
  }
}

After running the program with numRecords = 100000 and iterations = 10, my console looks like this:

HashCodeExperiment took 14.630283 ms
HashCodeExperiment took 7.125693 ms
HashCodeExperiment took 0.368151 ms
HashCodeExperiment took 0.431628 ms
HashCodeExperiment took 0.086455 ms
HashCodeExperiment took 0.056458 ms
HashCodeExperiment took 0.055138 ms
HashCodeExperiment took 0.062997 ms
HashCodeExperiment took 0.063736 ms
HashCodeExperiment took 0.056682 ms

Can somebody explain why is that? Shouldn't all be the same? Which is the real execution time?

Thanks a lot,
Peter

Environment parameters:
OS: ubuntu 14.04 LTS (64 bit)
IDE: IntelliJ IDEA 2016.1.1 (IU-145.597)
Scala: 2.11.7

like image 935
nemderogatorius Avatar asked Dec 18 '22 17:12

nemderogatorius


2 Answers

It's Java's JIT kicking in. Initially the plain bytecode is executed but after some time (1.5k/10k invocations by default for Oracle JVM, see -XX:CompileThreshold) the optimizations start processing the actual executed native code which usually results in quite drastic performance improvements.

As Ivan mentions, then there's caching of intermediate bytecode/native code and various other technologies involved, one of the most significant ones being the Garbage Collector itself which cause even more variance to individual results. Depending how heavily the code allocates new objects this might absolutely trash performance whenever GC occurs, but that's a separate issue.

To remove such outlier results when microbenchmarking it is recommendable that you benchmark multiple iterations of the action and discard the bottom and top 5..10% results and do the performance estimation based on remaining samples.

like image 98
Esko Avatar answered May 14 '23 23:05

Esko


Short answer: caching.

These are independent code blocks, but runs can't be fully independent because they are run in same JVM instance, and in same process of same CPU. JVM itself has a lot of optimization inside, including caching. Modern CPU also do so. So, as result, it's quite common behavior, that re-run usually takes less time than first run.

like image 42
Ivan Klass Avatar answered May 14 '23 21:05

Ivan Klass