Why is my Scala tail-recursion faster than the while loop?

Tags:

Here are two solutions to exercise 4.9 in Cay Horstmann's Scala for the Impatient: "Write a function lteqgt(values: Array[Int], v: Int) that returns a triple containing the counts of values less than v, equal to v, and greater than v." One uses tail recursion, the other uses a while loop. I thought that both would compile to similar bytecode but the while loop is slower than the tail recursion by a factor of almost 2. This suggests to me that my while method is badly written.

import scala.annotation.tailrec import scala.util.Random object PerformanceTest {    def main(args: Array[String]): Unit = {     val bigArray:Array[Int] = fillArray(new Array[Int](100000000))     println(time(lteqgt(bigArray, 25)))     println(time(lteqgt2(bigArray, 25)))   }    def time[T](block : => T):T = {     val start = System.nanoTime : Double     val result = block     val end = System.nanoTime : Double     println("Time = " + (end - start) / 1000000.0 + " millis")     result   }    @tailrec def fillArray(a:Array[Int], pos:Int=0):Array[Int] = {     if (pos == a.length)       a     else {       a(pos) = Random.nextInt(50)       fillArray(a, pos+1)     }   }    @tailrec def lteqgt(values: Array[Int], v:Int, lt:Int=0, eq:Int=0, gt:Int=0, pos:Int=0):(Int, Int, Int) = {     if (pos == values.length)       (lt, eq, gt)     else       lteqgt(values, v, lt + (if (values(pos) < v) 1 else 0), eq + (if (values(pos) == v) 1 else 0), gt + (if (values(pos) > v) 1 else 0), pos+1)    }    def lteqgt2(values:Array[Int], v:Int):(Int, Int, Int) = {     var lt = 0     var eq = 0     var gt = 0     var pos = 0     val limit = values.length     while (pos < limit) {       if (values(pos) > v)         gt += 1       else if (values(pos) < v)         lt += 1       else         eq += 1       pos += 1     }     (lt, eq, gt)   } }

Adjust the size of bigArray according to your heap size. Here is some sample output:

Time = 245.110899 millis (50004367,2003090,47992543) Time = 465.836894 millis (50004367,2003090,47992543)

Why is the while method so much slower than the tailrec? Naively the tailrec version looks to be at a slight disadvantage, as it must always perform 3 "if" checks for every iteration, whereas the while version will often only perform 1 or 2 tests due to the else construct. (NB reversing the order I perform the two methods does not affect the outcome).

230

asked Feb 06 '12 22:02

waifnstray

1 Answers

Test results (after reducing array size to 20000000)

Under Java 1.6.22 I get 151 and 122 ms for tail-recursion and while-loop respectively.

Under Java 1.7.0 I get 55 and 101 ms

So under Java 6 your while-loop is actually faster; both have improved in performance under Java 7, but the tail-recursive version has overtaken the loop.

Explanation

The performance difference is due to the fact that in your loop, you conditionally add 1 to the totals, while for recursion you always add either 1 or 0. So they are not equivalent. The equivalent while-loop to your recursive method is:

  def lteqgt2(values:Array[Int], v:Int):(Int, Int, Int) = {     var lt = 0     var eq = 0     var gt = 0     var pos = 0     val limit = values.length     while (pos < limit) {       gt += (if (values(pos) > v) 1 else 0)       lt += (if (values(pos) < v) 1 else 0)       eq += (if (values(pos) == v) 1 else 0)       pos += 1     }     (lt, eq, gt)   }

and this gives exactly the same execution time as the recursive method (regardless of Java version).

Discussion

I'm not an expert on why the Java 7 VM (HotSpot) can optimize this better than your first version, but I'd guess it's because it's taking the same path through the code each time (rather than branching along the if / else if paths), so the bytecode can be inlined more efficiently.

But remember that this is not the case in Java 6. Why one while-loop outperforms the other is a question of JVM internals. Happily for the Scala programmer, the version produced from idiomatic tail-recursion is the faster one in the latest version of the JVM.

The difference could also be occurring at the processor level. See this question, which explains how code slows down if it contains unpredictable branching.

109

answered Oct 11 '22 11:10

Luigi Plinge

Related questions
                            
                                Efficient dot products of large memory-mapped arrays
                            
                                Numpy: What is special about division by 0.5?
                            
                                How do I layout nested RecyclerViews, while remaining performant?
                            
                                Most efficient algorithm for merging sorted IEnumerable<T>
                            
                                What is better in MYSQL count(*) or count(1)?
                            
                                In C, why is "signed int" faster than "unsigned int"?
                            
                                Comparing BSXFUN and REPMAT
                            
                                WPF Datagrid Performance
                            
                                Why is std::atomic<bool> much slower than volatile bool?
                            
                                async await performance?
                            
                                Can jconsole data be retrieved from the command line?
                            
                                Why is it okay that this struct is mutable? When are mutable structs acceptable?
                            
                                At what point is it worth reusing arrays in Java?
                            
                                Is regex too slow? Real life examples where simple non-regex alternative is better
                            
                                ViewBag vs ViewData performance difference in MVC?
                            
                                What's the recommended way of creating objects in NodeJS?
                            
                                Why are loops always compiled into "do...while" style (tail jump)?
                            
                                Windows 7 php + Symfony2 terribly slow
                            
                                Why is Erlang slower than Java on all these small math benchmarks?
                            
                                WPF VirtualizingStackPanel for increased performance

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why is my Scala tail-recursion faster than the while loop?

Tags:

performance

loops

tail-recursion

scala

waifnstray

People also ask

1 Answers

Luigi Plinge

Recent Activity

Donate For Us