Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why are Scala "for loop comprehensions" so very slow compared to FOR loops?

It has been said that Scala For comprehensions are actually quite slow. The reason I was given was that due to a Java limitation, for comprehensions (such as "reduce", used below) need to generate a temporary object with each iteration in order to call the function passed in.

IS...THIS...TRUE? The tests below seem to bear this out, but I do not completely understand why this is the case.

This might make sense for "lambdas" or anonymous functions, but not for non-anonymous functions.

In my test, I ran for loops against list.reduce (see code below), and found them to be over twice as fast, even when each iteration called the exact same function that was passed to reduce!

I find this amazingly counter-intuitive (once would think that the Scala library would've been carefully created to be as optimal as possible).

In a test I put together, I ran the same loop (sum up the numbers 1 to a million, irrespective of overflow) five different ways:

  1. for loop over the array of values
  2. for loop, but calling a function instead of inline arithmetic
  3. for loop, creating an object that contains an addition function
  4. list.reduce, passing i an anonymous function
  5. list.reduce, passing in an object member function

Results were as follows: test: min/max/average (milliseconds)

1. 27/157/64.78
2. 27/192/65.77 <--- note the similarity between tests 1,2 and 4,5
3. 139/313/202.58
4. 63/342/150.18
5. 63/341/149.99

As can be seen, the "for comprehension" versions are on the order of "for with new for each instance", implying that a "new" may in fact be performed for both the anonymous and non-anonymous function versions.

Methodology: code below (test call removed) was compiled into a single .jar file to ensure all versions ran the same library code. Each test in each iteration was called in a new JVM (i.e. scala -cp ... for each test) in order to remove heap size issues.

class t(val i: Int) {
    def summit(j: Int) = j + i
}

object bar {
    val biglist:List[Int]  =  (1 to 1000000).toList

    def summit(i: Int, j:Int) = i+j

    // Simple for loop
    def forloop:  Int = {
        var result: Int = 0
        for(i <- biglist) {
            result += i
        }
        result
    }

    // For loop with a function instead of inline math
    def forloop2:  Int = {
        var result: Int = 0
        for(i <- biglist) {
            result = summit(result,i)
        }
        result
    }

    // for loop with a generated object PER iteration
    def forloop3: Int = {
        var result: Int = 0
        for(i <- biglist) {
            val t = new t(result)
            result = t.summit(i)
        }
        result
    }

    // list.reduce with an anonymous function passed in
    def anonymousfunc: Int = {
        biglist.reduce((i,j) => {i+j})
    }

    // list.reduce with a named function
    def realfunc: Int = {
        biglist.reduce(summit)
    }

    // test calling code excised for brevity. One example given:
    args(0) match {
        case "1" => {
                    val start = System.currentTimeMillis()
                    forloop
                    val end = System.currentTimeMillis()
                    println("for="+(end - start))
                    }
         ...
}
like image 928
Mark Gerolimatos Avatar asked May 28 '13 07:05

Mark Gerolimatos


1 Answers

What you were told was true about "for comprehensions", but the problem with your question is that you've mixed up "for comprehensions" with "anonymous functions".

"For comprehension" in Scala is a syntactic sugar for a series of .flatMap, .map and .filter applications. Since you are testing reduction algorithms and since it's impossible to implement a reduction algorithm using those three functions, your test cases are incorrect.

Here's an example of a "for comprehension":

val listOfLists = List(List(1,2), List(3,4), List(5))
val result = 
  for {
    itemOfListOfLists <- listOfLists
    itemOfItemOfListOfLists <- itemOfListOfLists
  }
  yield (itemOfItemOfListOfLists + 1)
assert( result == List(2,3,4,5,6) )

The compiler desugars the Comprehension part to the following:

val result =
  listOfLists.flatMap(
    itemOfListOfLists => itemOfListOfLists.map(
      itemOfItemOfListOfLists => itemOfItemOfListOfLists + 1
    )
  )

Then it desugars the Anonymous Function syntax:

val result =
  listOfLists.flatMap(
    new Function1[List[Int], List[Int]] {
      override def apply(itemOfListOfLists: List[Int]): List[Int] =
        itemOfListOfLists.map(
          new Function1[Int, Int] {
            override def apply(itemOfItemOfListOfLists: Int): Int =
              itemOfItemOfListOfLists + 1
          }
        )
    }
  )

From the desugarred code it's now evident that the Function1[Int, Int] class gets instantiated every time the apply(itemOfListOfLists: List[Int]): List[Int] method gets called. That happens for every entry of listOfLists. So the more complex your comprehension is the more instantiations of the Function objects you get.

like image 107
Nikita Volkov Avatar answered Sep 17 '22 11:09

Nikita Volkov