Is multiple assignment (e.g. val (x, y) = (1, 2)
) less efficient at runtime than the corresponding single assignments (val x = 1; val y = 2
)?
I can imagine the answer being "yes" because scala might need to construct intermediate tuples. Is this correct?
What if I had an extra tuple laying around, e.g. val tup = (1, 2)
Now is it more efficient to do:
(a) val (x, y) = tup
OR
(b) val x = tup._1; val y = tup._2
or are they the same?
The difference from the previous example is that the RHS no longer needs to be allocated.
You can use the new :javap
feature of the scala 2.9 REPL:
scala> class A { val (a, b) = (1, 2) }
scala> :javap -c A
Compiled from "<console>"
public class A extends java.lang.Object implements scala.ScalaObject{
...
public A();
Code:
0: aload_0
1: invokespecial #22; //Method java/lang/Object."<init>":()V
4: aload_0
5: new #24; //class scala/Tuple2$mcII$sp
8: dup
9: iconst_1
10: iconst_2
11: invokespecial #27; //Method scala/Tuple2$mcII$sp."<init>":(II)V
14: astore_1
15: aload_1
16: ifnull 68
19: aload_1
20: astore_2
21: new #24; //class scala/Tuple2$mcII$sp
24: dup
25: aload_2
26: invokevirtual #33; //Method scala/Tuple2._1:()Ljava/lang/Object;
29: invokestatic #39; //Method scala/runtime/BoxesRunTime.unboxToInt:(Ljava/lang/Object;)I
32: aload_2
33: invokevirtual #42; //Method scala/Tuple2._2:()Ljava/lang/Object;
36: invokestatic #39; //Method scala/runtime/BoxesRunTime.unboxToInt:(Ljava/lang/Object;)I
39: invokespecial #27; //Method scala/Tuple2$mcII$sp."<init>":(II)V
42: putfield #44; //Field x$1:Lscala/Tuple2;
45: aload_0
46: aload_0
47: getfield #44; //Field x$1:Lscala/Tuple2;
50: invokevirtual #47; //Method scala/Tuple2._1$mcI$sp:()I
53: putfield #14; //Field a:I
56: aload_0
57: aload_0
58: getfield #44; //Field x$1:Lscala/Tuple2;
61: invokevirtual #50; //Method scala/Tuple2._2$mcI$sp:()I
64: putfield #16; //Field b:I
67: return
68: new #52; //class scala/MatchError
71: dup
72: aload_1
73: invokespecial #55; //Method scala/MatchError."<init>":(Ljava/lang/Object;)V
76: athrow
}
scala> class B { val a = 1; val b = 2 }
scala> :javap -c B
Compiled from "<console>"
public class B extends java.lang.Object implements scala.ScalaObject{
...
public B();
Code:
0: aload_0
1: invokespecial #20; //Method java/lang/Object."<init>":()V
4: aload_0
5: iconst_1
6: putfield #12; //Field a:I
9: aload_0
10: iconst_2
11: putfield #14; //Field b:I
14: return
}
so i guess the answer is the tuple version is slower. i wonder why there is boxing going on, shouldn't that be gone with the specialization of tuples?!
I wasn't really satified with the lack of benchmarks, so here are some benchmarks done using https://github.com/sirthias/scala-benchmarking-template, which uses Google Caliper in the background. Charts are contain the calculated (ns/inside loop execution), but the text results are directly from the console. The code:
package org.example
import annotation.tailrec
import com.google.caliper.Param
class Benchmark extends SimpleScalaBenchmark {
@Param(Array("10", "100", "1000", "10000"))
val length: Int = 0
var array: Array[Int] = _
override def setUp() {
array = new Array(length)
}
def timeRegular(reps: Int) = repeat(reps) {
var result = 0
array.foreach {value => {
val tuple = (value, value)
val (out1, out2) = tuple
result += out1
result += out2
}}
result
}
def timeUnpack(reps: Int) = repeat(reps) {
var result = 0
array.foreach {value =>{
val tuple = (value, value)
val out1 = tuple._1
val out2 = tuple._2
result += out1
result += out2
}}
result
}
def timeBoxedUnpack(reps: Int) = repeat(reps) {
var result = 0
array.foreach {value =>{
val tuple = (value, value, value)
val out1 = tuple._1
val out2 = tuple._2
val out3 = tuple._3
result += out1
result += out2
result += out3
}}
result
}
}
0% Scenario{vm=java, trial=0, benchmark=Regular, length=10} 102.09 ns; σ=1.04 ns @ 10 trials
8% Scenario{vm=java, trial=0, benchmark=Unpack, length=10} 28.23 ns; σ=0.27 ns @ 6 trials
17% Scenario{vm=java, trial=0, benchmark=BoxedUnpack, length=10} 110.17 ns; σ=1.95 ns @ 10 trials
25% Scenario{vm=java, trial=0, benchmark=Regular, length=100} 909.73 ns; σ=6.42 ns @ 3 trials
33% Scenario{vm=java, trial=0, benchmark=Unpack, length=100} 271.40 ns; σ=1.35 ns @ 3 trials
42% Scenario{vm=java, trial=0, benchmark=BoxedUnpack, length=100} 946.59 ns; σ=8.38 ns @ 3 trials
50% Scenario{vm=java, trial=0, benchmark=Regular, length=1000} 8966.33 ns; σ=40.17 ns @ 3 trials
58% Scenario{vm=java, trial=0, benchmark=Unpack, length=1000} 2517.54 ns; σ=4.56 ns @ 3 trials
67% Scenario{vm=java, trial=0, benchmark=BoxedUnpack, length=1000} 9374.71 ns; σ=68.25 ns @ 3 trials
75% Scenario{vm=java, trial=0, benchmark=Regular, length=10000} 81244.84 ns; σ=661.81 ns @ 3 trials
83% Scenario{vm=java, trial=0, benchmark=Unpack, length=10000} 23502.73 ns; σ=122.83 ns @ 3 trials
92% Scenario{vm=java, trial=0, benchmark=BoxedUnpack, length=10000} 112683.27 ns; σ=1101.51 ns @ 4 trials
length benchmark ns linear runtime
10 Regular 102.1 =
10 Unpack 28.2 =
10 BoxedUnpack 110.2 =
100 Regular 909.7 =
100 Unpack 271.4 =
100 BoxedUnpack 946.6 =
1000 Regular 8966.3 ==
1000 Unpack 2517.5 =
1000 BoxedUnpack 9374.7 ==
10000 Regular 81244.8 =====================
10000 Unpack 23502.7 ======
10000 BoxedUnpack 112683.3 ==============================
0% Scenario{vm=java, trial=0, benchmark=Regular, length=10} 28.26 ns; σ=0.13 ns @ 3 trials
8% Scenario{vm=java, trial=0, benchmark=Unpack, length=10} 28.27 ns; σ=0.07 ns @ 3 trials
17% Scenario{vm=java, trial=0, benchmark=BoxedUnpack, length=10} 109.56 ns; σ=2.27 ns @ 10 trials
25% Scenario{vm=java, trial=0, benchmark=Regular, length=100} 273.40 ns; σ=2.73 ns @ 5 trials
33% Scenario{vm=java, trial=0, benchmark=Unpack, length=100} 271.25 ns; σ=2.63 ns @ 6 trials
42% Scenario{vm=java, trial=0, benchmark=BoxedUnpack, length=100} 1088.00 ns; σ=10.60 ns @ 3 trials
50% Scenario{vm=java, trial=0, benchmark=Regular, length=1000} 2516.30 ns; σ=7.13 ns @ 3 trials
58% Scenario{vm=java, trial=0, benchmark=Unpack, length=1000} 2525.00 ns; σ=24.25 ns @ 6 trials
67% Scenario{vm=java, trial=0, benchmark=BoxedUnpack, length=1000} 10188.98 ns; σ=101.32 ns @ 3 trials
75% Scenario{vm=java, trial=0, benchmark=Regular, length=10000} 25886.80 ns; σ=116.33 ns @ 3 trials
83% Scenario{vm=java, trial=0, benchmark=Unpack, length=10000} 25938.97 ns; σ=76.02 ns @ 3 trials
92% Scenario{vm=java, trial=0, benchmark=BoxedUnpack, length=10000} 115629.82 ns; σ=1159.41 ns @ 5 trials
length benchmark ns linear runtime
10 Regular 28.3 =
10 Unpack 28.3 =
10 BoxedUnpack 109.6 =
100 Regular 273.4 =
100 Unpack 271.2 =
100 BoxedUnpack 1088.0 =
1000 Regular 2516.3 =
1000 Unpack 2525.0 =
1000 BoxedUnpack 10189.0 ==
10000 Regular 25886.8 ======
10000 Unpack 25939.0 ======
10000 BoxedUnpack 115629.8 ==============================
Unpacking tuples is fast as long as the tuple arity <= 2. If it is greater than 2, there is too much indirection and for the Hotspot compiler to optimise.
Scala 2.9.2 has some sort of weird issues making assignment with tuples faster than regular assignment. Weird, but it can probably be disregarded.
This is done using
java version "1.7.0_45"
Java(TM) SE Runtime Environment (build 1.7.0_45-b18)
Java HotSpot(TM) 64-Bit Server VM (build 24.45-b08, mixed mode)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With