Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Performance of serializable lambdas in Java 8

I have read in some comments by Brian Goetz that serializable lambdas "have significantly higher performance costs compared to nonserializable lambdas".

I am curious now: Where exactly is that overhead and what causes it? Does it affect only the instantiation of a lambda, or also in the invocation?

In the code below, would both cases ( callExistingInstance() and callWithNewInstance() ) be affected by the serializability of "MyFunction", or only the second case?

interface MyFunction<IN, OUT> {
    OUT call(IN arg);
}

void callExistingInstance() {

    long toAdd = 1;
    long value = 0;

    final MyFunction<Long, Long> adder = (number) -> number + toAdd;

    for (int i = 0; i < LARGE_NUMBER; i++) {
        value = adder.call(value);
    }
}

void callWithNewInstance() {

    long value = 0;

    for (int i = 0; i < LARGE_NUMBER; i++) {
        long toAdd = 1;

        MyFunction<Long, Long> adder = (number) -> number + toAdd;

        value = adder.call(value);
    }
}
like image 802
Stephan Ewen Avatar asked Jan 07 '15 16:01

Stephan Ewen


2 Answers

The performance hit comes when you serialize/deserialize, and when you instantiate. Only your second example takes the hit. The reason it's expensive is that when you deserialize, the underlying class of your lambda is instantiated by a sort of special reflection (that has the ability to create/define a class) rather than a plain old serialized object (where would the class definition come from?), as well as perform some security checks...

like image 144
BadZen Avatar answered Sep 19 '22 01:09

BadZen


Normally, the runtime part of the lambda implementation will generate a class which will basically consist of a single implementation method. The information needed to generate such a class are given by a bootstrap method invocation to LambdaMetafactory.metafactory at runtime.

When enabling Serialization, things get more complicated. First, the compiled code will use the alternative bootstrap method LambdaMetafactory.altMetafactory which offers a greater flexibility at the price of having to parse varargs parameters according to flags specified within the parameter array.

Then the generated lambda class has to have a writeReplace method (see the second half of the Serializable documentation) which has to create and return a SerializedLambda instance containing all information required to re-create the lambda instance. Since the single implementation method of a lambda’s class consists of a simple delegation call only, that writeReplace method and the related constant information will multiply the generated class’ size.

It’s also worth noting that your class creating that Serializable lambda instance will have a synthetic method $deserializeLambda$ (compare the class documentation of SerializedLambda as a counterpart to the process of the lambda’s writeReplace. That will increase your classes disk usage and loading time (but not affect the evaluation of the lambda expressions).


In your example code, both methods would be affected by the same amount of time as the bootstrapping and class generation happens only once per lambda expression. On subsequent evaluations, the class generated on the first evaluation will be re-used and only a new instance created (if not even the instance is re-used). We are talking about a one-time overhead here, even when the lambda expression is contained in a loop, it affects the first iteration only.

Note that if you have a lambda expression within a loop, there might be a new instance created for each iteration while having it outside the loop will for sure have one instance during the entire loop. But this behavior does not depend on the question whether the target interface is Serializable. It merely depends on whether the expression captures value (compare with this answer).

Note that if you had written

final long toAdd = 1;
MyFunction<Long, Long> adder = (number) -> number + toAdd;

in your second method (note the explicit final modifier) the value toAdd would be a compile-time constant and the expression gets compiled like if you had written (number) -> number + 1, i.e. will not capture a value anymore. Then you would get the same lambda instance in each loop iteration (with the current version of Oracle’s JVM). So the question whether a new instance is created sometimes depends on small bits of the context. But usually, the performance impact is rather small.

like image 31
Holger Avatar answered Sep 20 '22 01:09

Holger