Scala code for simple addition using tail recursion <pre class="prettyprint"><code>def add(list : List[Int],sum:Int):Int = { //Thread.dumpStack() if (list.isEmpty) { sum } else { val headVal = list.head add(list.tail, sum + headVal) } } </code></pre> Below is the java code for addition in recursive mode. <pre class="prettyprint"><code>public static int add(List<Integer> list, Integer sum) { // Thread.dumpStack(); if (list.isEmpty()) { return sum; } else { int headVal = list.remove(0); return add(list, sum + headVal); } } </code></pre> The Java version runs at least 10 times faster. Ran this for 1000 entries. Measured time by using <code>System.nanoTime()</code> API before and after. Scala version 2.10, java version Java 7. Same JVM properties for both run.

First of all the Scala method <code>add</code> you have shown is not in a (class) context. If you have this method in a class, the tail recursion optimization cannot be applied, since the method is neither <code>final</code> nor <code>private</code>. If you add <code>@tailrec</code>, compilation will fail. If I run it with 10000, it results in a stack overflow. As for the Java version: The Java version is using a mutable <code>List</code>: The head/tail decomposition alters the underlying list. So after summing you can't use that list any longer, since it is empty. Further the <code>List</code> in Scala has a completely different meaning than a Java <code>List</code>; the Scala list is meant for head/tail decomposition. As far as I know java.util.List has no tail method, and neither does the Java compiler apply tailrec optimizations, so the comparision is "unfair". Anyway, I have run some JMH-based tests on different scenarios. The only two scenarios you can really compare are "Scala while" and "Java for". They neither use OOP nor functional programming, it's just imperative. The results of five different Scala scenarios (Please scroll to the right, there is a small description in the last column) <pre class="prettyprint"><code>Benchmark Mode Samples Mean Mean error Units a.b.s.b.Benchmark5a.run thrpt 10 238,515 7,769 ops/ms Like in the question, but with @tailrec a.b.s.b.Benchmark5b.run thrpt 10 130,202 2,232 ops/ms Using List.sum a.b.s.b.Benchmark5c.run thrpt 10 2756,132 29,920 ops/ms while (no List, vars, imperative) a.b.s.b.Benchmark5d.run thrpt 10 237,286 2,203 ops/ms tailrec version with pattern matching a.b.s.b.Benchmark5e.run thrpt 10 214,719 2,483 ops/ms Like in the question (= no tailrec opt) </code></pre> <ul> <li>5a and 5e are like in the question; 5a with <code>@tailrec</code>. </li> <li>5b: <code>List.sum</code>: Was very slow</li> <li>5c: Not a big surprise, the imperative version is the fastest one.</li> <li>5d uses pattern matching instead of <code>if</code> (that would have been "my style"), I am adding the source of this one:</li> </ul> <pre class="prettyprint lang-scala prettyprint-override"><code>package app.benchmark.scala.benchmark5 import scala.annotation._ import org.openjdk.jmh.annotations.GenerateMicroBenchmark import org.openjdk.jmh.annotations.Scope import org.openjdk.jmh.annotations.State import org.openjdk.jmh.runner.Runner import org.openjdk.jmh.runner.RunnerException import org.openjdk.jmh.runner.options.Options import org.openjdk.jmh.runner.options.OptionsBuilder @State(Scope.Benchmark) object BenchmarkState5d { val list = List.range(1, 1000) } class Benchmark5d { private def add(list : List[Int]): Int = { @tailrec def add(list : List[Int], sum: Int): Int = { list match { case Nil => sum case h :: t => add(t, h + sum) } } add(list, 0) } @GenerateMicroBenchmark def run() = { add(BenchmarkState5d.list) } } </code></pre> The three Java scenarios <pre class="prettyprint"><code>Benchmark Mode Samples Mean Mean error Units a.b.j.b.Benchmark5a.run thrpt 10 40,437 0,532 ops/ms mutable (rebuilds the list in each iteration) a.b.j.b.Benchmark5b.run thrpt 10 0,450 0,008 ops/ms subList a.b.j.b.Benchmark5c.run thrpt 10 2735,951 29,177 ops/ms for </code></pre> If you really want to compare in the meaning of functional programming style (= immutable, tail recursion, head/tail decomposition), than the Java version is five times slower. As pointed out in a comment by Marko Topolnik: <blockquote> <code>subList</code> doesn't copy the tail, but it does something comparably bad, when applied to a <code>LinkedList</code>: it wraps the original list and uses an offset to accomodate the semantics. The result is that an O(n) recursive algorithm becomes O(n2)---the same as if the tail was repeatedly copied. Plus, wrappers accrete so you end up with the list wrapped a thousand times. Definitely not comparable to a head/tail list </blockquote> <pre class="prettyprint lang-java prettyprint-override"><code>public class Benchmark5a { public static int add(List<Integer> list, Integer sum) { if (list.isEmpty()) { return sum; } else { int headVal = list.remove(0); return add(list, sum + headVal); } } @GenerateMicroBenchmark public long run() { final List<Integer> list = new LinkedList<Integer>(); for(int i = 0; i < 1000; i++) { list.add(i); } return add(list, 0); } public static void main(String[] args) { System.out.println(new Benchmark5a().run()); } } </code></pre> <hr> <pre class="prettyprint lang-java prettyprint-override"><code>@State(Scope.Benchmark) class BenchmarkState5b { public final static List<Integer> list = new LinkedList<Integer>(); static { for(int i = 0; i < 1000; i++) { list.add(i); } } } public class Benchmark5b { public static int add(List<Integer> list, int sum) { if (list.isEmpty()) { return sum; } else { int headVal = list.get(0); return add(list.subList(1, list.size()), sum + headVal); } } @GenerateMicroBenchmark public long run() { return add(BenchmarkState5b.list, 0); } public static void main(String[] args) { System.out.println(new Benchmark5b().run()); } } </code></pre> <hr> Scala results in detail (all results only show the last scenario, and the overall results) <pre class="prettyprint"><code>[...] # VM invoker: /home/oracle-jdk-1.8-8u40/data/oracle-jdk-1.8.0_40/jre/bin/java # VM options: <none> # Fork: 1 of 1 # Warmup: 3 iterations, 1 s each # Measurement: 10 iterations, 1 s each # Threads: 1 thread, will synchronize iterations # Benchmark mode: Throughput, ops/time # Benchmark: app.benchmark.scala.benchmark5.Benchmark5e.run # Warmup Iteration 1: 166,153 ops/ms # Warmup Iteration 2: 215,242 ops/ms # Warmup Iteration 3: 216,632 ops/ms Iteration 1: 215,526 ops/ms Iteration 2: 213,720 ops/ms Iteration 3: 213,967 ops/ms Iteration 4: 215,468 ops/ms Iteration 5: 216,247 ops/ms Iteration 6: 217,514 ops/ms Iteration 7: 215,503 ops/ms Iteration 8: 211,969 ops/ms Iteration 9: 212,989 ops/ms Iteration 10: 214,291 ops/ms Result : 214,719 ±(99.9%) 2,483 ops/ms Statistics: (min, avg, max) = (211,969, 214,719, 217,514), stdev = 1,642 Confidence interval (99.9%): [212,236, 217,202] Benchmark Mode Samples Mean Mean error Units a.b.s.b.Benchmark5a.run thrpt 10 238,515 7,769 ops/ms a.b.s.b.Benchmark5b.run thrpt 10 130,202 2,232 ops/ms a.b.s.b.Benchmark5c.run thrpt 10 2756,132 29,920 ops/ms a.b.s.b.Benchmark5d.run thrpt 10 237,286 2,203 ops/ms a.b.s.b.Benchmark5e.run thrpt 10 214,719 2,483 ops/ms </code></pre> <hr> Java results in detail <pre class="prettyprint"><code># VM invoker: /home/oracle-jdk-1.8-8u40/data/oracle-jdk-1.8.0_40/jre/bin/java # VM options: <none> # Fork: 1 of 1 # Warmup: 3 iterations, 1 s each # Measurement: 10 iterations, 1 s each # Threads: 1 thread, will synchronize iterations # Benchmark mode: Throughput, ops/time # Benchmark: app.benchmark.java.benchmark5.Benchmark5c.run # Warmup Iteration 1: 2777,495 ops/ms # Warmup Iteration 2: 2888,040 ops/ms # Warmup Iteration 3: 2692,851 ops/ms Iteration 1: 2737,169 ops/ms Iteration 2: 2745,368 ops/ms Iteration 3: 2754,105 ops/ms Iteration 4: 2706,131 ops/ms Iteration 5: 2721,593 ops/ms Iteration 6: 2769,261 ops/ms Iteration 7: 2734,461 ops/ms Iteration 8: 2741,494 ops/ms Iteration 9: 2740,012 ops/ms Iteration 10: 2709,915 ops/ms Result : 2735,951 ±(99.9%) 29,177 ops/ms Statistics: (min, avg, max) = (2706,131, 2735,951, 2769,261), stdev = 19,299 Confidence interval (99.9%): [2706,774, 2765,128] Benchmark Mode Samples Mean Mean error Units a.b.j.b.Benchmark5a.run thrpt 10 40,437 0,532 ops/ms a.b.j.b.Benchmark5b.run thrpt 10 0,450 0,008 ops/ms a.b.j.b.Benchmark5c.run thrpt 10 2735,951 29,177 ops/ms </code></pre> <hr> Update: Added another Java scenario 5d using <code>ArrayList</code> <pre class="prettyprint"><code>Benchmark Mode Samples Mean Mean error Units a.b.j.b.Benchmark5a.run thrpt 10 34,931 0,504 ops/ms a.b.j.b.Benchmark5b.run thrpt 10 0,430 0,005 ops/ms a.b.j.b.Benchmark5c.run thrpt 10 2610,085 9,664 ops/ms a.b.j.b.Benchmark5d.run thrpt 10 56,693 1,218 ops/ms </code></pre>

Why is Scala's tail recursion slower than that of Java?

Tags:

java

tail-recursion

jvm

scala

Scala code for simple addition using tail recursion

def add(list : List[Int],sum:Int):Int = {
    //Thread.dumpStack()
    if (list.isEmpty) {
        sum
    } else {
        val headVal  = list.head
        add(list.tail, sum + headVal)
    }
}

Below is the java code for addition in recursive mode.

public static int add(List<Integer> list, Integer sum) {
    // Thread.dumpStack();
    if (list.isEmpty()) {
        return sum;
    } else {
        int headVal = list.remove(0);
        return add(list, sum + headVal);
    }
}

The Java version runs at least 10 times faster. Ran this for 1000 entries. Measured time by using System.nanoTime() API before and after.

Scala version 2.10, java version Java 7. Same JVM properties for both run.

360

asked Mar 07 '15 14:03

RockSolid

2 Answers

First of all the Scala method add you have shown is not in a (class) context. If you have this method in a class, the tail recursion optimization cannot be applied, since the method is neither final nor private. If you add @tailrec, compilation will fail. If I run it with 10000, it results in a stack overflow.

As for the Java version: The Java version is using a mutable List: The head/tail decomposition alters the underlying list. So after summing you can't use that list any longer, since it is empty.

Further the List in Scala has a completely different meaning than a Java List; the Scala list is meant for head/tail decomposition. As far as I know java.util.List has no tail method, and neither does the Java compiler apply tailrec optimizations, so the comparision is "unfair".

Anyway, I have run some JMH-based tests on different scenarios.

The only two scenarios you can really compare are "Scala while" and "Java for". They neither use OOP nor functional programming, it's just imperative.

The results of five different Scala scenarios

(Please scroll to the right, there is a small description in the last column)

Benchmark                   Mode   Samples         Mean   Mean error    Units
a.b.s.b.Benchmark5a.run    thrpt        10      238,515        7,769   ops/ms Like in the question, but with @tailrec
a.b.s.b.Benchmark5b.run    thrpt        10      130,202        2,232   ops/ms Using List.sum
a.b.s.b.Benchmark5c.run    thrpt        10     2756,132       29,920   ops/ms while (no List, vars, imperative)
a.b.s.b.Benchmark5d.run    thrpt        10      237,286        2,203   ops/ms tailrec version with pattern matching
a.b.s.b.Benchmark5e.run    thrpt        10      214,719        2,483   ops/ms Like in the question (= no tailrec opt)

5a and 5e are like in the question; 5a with @tailrec.
5b: List.sum: Was very slow
5c: Not a big surprise, the imperative version is the fastest one.
5d uses pattern matching instead of if (that would have been "my style"), I am adding the source of this one:

package app.benchmark.scala.benchmark5

import scala.annotation._
import org.openjdk.jmh.annotations.GenerateMicroBenchmark
import org.openjdk.jmh.annotations.Scope
import org.openjdk.jmh.annotations.State
import org.openjdk.jmh.runner.Runner
import org.openjdk.jmh.runner.RunnerException
import org.openjdk.jmh.runner.options.Options
import org.openjdk.jmh.runner.options.OptionsBuilder

@State(Scope.Benchmark)
object BenchmarkState5d {
  val list = List.range(1, 1000)
}

class Benchmark5d {
  private def add(list : List[Int]): Int = {
    @tailrec
    def add(list : List[Int], sum: Int): Int = {
      list match {
        case Nil =>
          sum
        case h :: t =>
          add(t, h + sum)
      }
    }
    add(list, 0)
  }

  @GenerateMicroBenchmark
  def run() = {
    add(BenchmarkState5d.list)
  }
}

The three Java scenarios

Benchmark                   Mode   Samples         Mean   Mean error    Units
a.b.j.b.Benchmark5a.run    thrpt        10       40,437        0,532   ops/ms mutable (rebuilds the list in each iteration)
a.b.j.b.Benchmark5b.run    thrpt        10        0,450        0,008   ops/ms subList
a.b.j.b.Benchmark5c.run    thrpt        10     2735,951       29,177   ops/ms for

If you really want to compare in the meaning of functional programming style (= immutable, tail recursion, head/tail decomposition), than the Java version is five times slower.

As pointed out in a comment by Marko Topolnik:

subList doesn't copy the tail, but it does something comparably bad, when applied to a LinkedList: it wraps the original list and uses an offset to accomodate the semantics. The result is that an O(n) recursive algorithm becomes O(n2)---the same as if the tail was repeatedly copied. Plus, wrappers accrete so you end up with the list wrapped a thousand times. Definitely not comparable to a head/tail list

public class Benchmark5a {
    public static int add(List<Integer> list, Integer sum) {
        if (list.isEmpty()) {
            return sum;
        } else {
            int headVal = list.remove(0);
            return add(list, sum + headVal);
        }
    }

    @GenerateMicroBenchmark
    public long run() {
        final List<Integer> list = new LinkedList<Integer>();
        for(int i = 0; i < 1000; i++) {
            list.add(i);
        }
        return add(list, 0);
    }

    public static void main(String[] args) {
        System.out.println(new Benchmark5a().run());
    }
}

@State(Scope.Benchmark)
class BenchmarkState5b {
    public final static List<Integer> list = new LinkedList<Integer>();

    static {
        for(int i = 0; i < 1000; i++) {
            list.add(i);
        }
    }
}

public class Benchmark5b {
    public static int add(List<Integer> list, int sum) {
        if (list.isEmpty()) {
            return sum;
        } else {
            int headVal = list.get(0);
            return add(list.subList(1, list.size()), sum + headVal);
        }
    }

    @GenerateMicroBenchmark
    public long run() {
        return add(BenchmarkState5b.list, 0);
    }

    public static void main(String[] args) {
        System.out.println(new Benchmark5b().run());
    }
}

Scala results in detail

(all results only show the last scenario, and the overall results)

[...]

# VM invoker: /home/oracle-jdk-1.8-8u40/data/oracle-jdk-1.8.0_40/jre/bin/java
# VM options: <none>
# Fork: 1 of 1
# Warmup: 3 iterations, 1 s each
# Measurement: 10 iterations, 1 s each
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Throughput, ops/time
# Benchmark: app.benchmark.scala.benchmark5.Benchmark5e.run
# Warmup Iteration   1: 166,153 ops/ms
# Warmup Iteration   2: 215,242 ops/ms
# Warmup Iteration   3: 216,632 ops/ms
Iteration   1: 215,526 ops/ms
Iteration   2: 213,720 ops/ms
Iteration   3: 213,967 ops/ms
Iteration   4: 215,468 ops/ms
Iteration   5: 216,247 ops/ms
Iteration   6: 217,514 ops/ms
Iteration   7: 215,503 ops/ms
Iteration   8: 211,969 ops/ms
Iteration   9: 212,989 ops/ms
Iteration  10: 214,291 ops/ms

Result : 214,719 ±(99.9%) 2,483 ops/ms
  Statistics: (min, avg, max) = (211,969, 214,719, 217,514), stdev = 1,642
  Confidence interval (99.9%): [212,236, 217,202]


Benchmark                   Mode   Samples         Mean   Mean error    Units
a.b.s.b.Benchmark5a.run    thrpt        10      238,515        7,769   ops/ms
a.b.s.b.Benchmark5b.run    thrpt        10      130,202        2,232   ops/ms
a.b.s.b.Benchmark5c.run    thrpt        10     2756,132       29,920   ops/ms
a.b.s.b.Benchmark5d.run    thrpt        10      237,286        2,203   ops/ms
a.b.s.b.Benchmark5e.run    thrpt        10      214,719        2,483   ops/ms

Java results in detail

# VM invoker: /home/oracle-jdk-1.8-8u40/data/oracle-jdk-1.8.0_40/jre/bin/java
# VM options: <none>
# Fork: 1 of 1
# Warmup: 3 iterations, 1 s each
# Measurement: 10 iterations, 1 s each
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Throughput, ops/time
# Benchmark: app.benchmark.java.benchmark5.Benchmark5c.run
# Warmup Iteration   1: 2777,495 ops/ms
# Warmup Iteration   2: 2888,040 ops/ms
# Warmup Iteration   3: 2692,851 ops/ms
Iteration   1: 2737,169 ops/ms
Iteration   2: 2745,368 ops/ms
Iteration   3: 2754,105 ops/ms
Iteration   4: 2706,131 ops/ms
Iteration   5: 2721,593 ops/ms
Iteration   6: 2769,261 ops/ms
Iteration   7: 2734,461 ops/ms
Iteration   8: 2741,494 ops/ms
Iteration   9: 2740,012 ops/ms
Iteration  10: 2709,915 ops/ms

Result : 2735,951 ±(99.9%) 29,177 ops/ms
  Statistics: (min, avg, max) = (2706,131, 2735,951, 2769,261), stdev = 19,299
  Confidence interval (99.9%): [2706,774, 2765,128]


Benchmark                   Mode   Samples         Mean   Mean error    Units
a.b.j.b.Benchmark5a.run    thrpt        10       40,437        0,532   ops/ms
a.b.j.b.Benchmark5b.run    thrpt        10        0,450        0,008   ops/ms
a.b.j.b.Benchmark5c.run    thrpt        10     2735,951       29,177   ops/ms

Update: Added another Java scenario 5d using ArrayList

Benchmark                   Mode   Samples         Mean   Mean error    Units
a.b.j.b.Benchmark5a.run    thrpt        10       34,931        0,504   ops/ms
a.b.j.b.Benchmark5b.run    thrpt        10        0,430        0,005   ops/ms
a.b.j.b.Benchmark5c.run    thrpt        10     2610,085        9,664   ops/ms
a.b.j.b.Benchmark5d.run    thrpt        10       56,693        1,218   ops/ms

answered Sep 19 '22 13:09

Beryllium

You cannot draw any meaningful conclusions about performance from such a short experiment. You might have hit a garbage collection, there might have been some other process hogging your CPU, all classes may not have been loaded and the JVM might be optimizing your code all while your test is running. Any of which would adversely effect test results.

Processing 1000 elements will be extremely fast. You should make your experiment long enough such that inaccuracy in timing measurement or external influences have a smaller effect. Try with a million elements and if that still only take a few seconds try 10 million.

Consider running the test a few times when the JVM starts up before taking the measurement to "warm up" the JVM. You want to ensure that any classes that are lazy loaded have been loaded and that any realtime optimization performed by the JVM is complete before you start your test.

Repeat the experiment a few more times using the longer list of elements. Then throw away the fastest result and the slowest result and average the rest.

answered Sep 18 '22 13:09

bhspencer

Related questions
                            
                                Spring Boot default properties encoding change?
                            
                                How can I re-throw an exception in a lambda block as from the outer block?
                            
                                javac: invalid flag: activation-1.1.jar
                            
                                pyjnius "Class not found" when importing jar file
                            
                                How to use KeyEvent in JavaFX project?
                            
                                Obfuscate Strings
                            
                                Null pointer Exception at ActionBar
                            
                                Current recommended way to convert byte[] to JsonNode and back
                            
                                How to use PathFilter in Apache Spark?
                            
                                Time Complexity of finding a basin
                            
                                Make IntelliJ aware of a properties file
                            
                                How to add table heading drop-down with Apache Poi
                            
                                How to set the timezone region for JDBC Connection and avoid the SqlException timezone region not found?
                            
                                How to enforce generic type with Kotlin interop
                            
                                What is Java equivalent of C# default(T)
                            
                                Why does java.io.File not implement Autocloseable? [duplicate]
                            
                                Android Studio displaying a warning unchecked call for the following code
                            
                                Is it safe to catch StackOverflowError in Java?
                            
                                Error inflating class fragment InflateException Binary XML file
                            
                                "Warning: [unchecked] unchecked cast" when casting Object to ArrayList<String[]>

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With