Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Understanding Java 8 Streams at the Bytecode Level

There is a wealth of information and tutorials online regarding streams in Java 8. Most of what I have found does a good job of explaining the how the various elements of a stream work on a conceptual level. However, I have not encountered much material which describes how streams are actually implemented and executed by the JVM under the hood.

Consider comparing an operation on a Collection between using a stream and doing it the old school pre-Java 8 way. Would the underlying Bytecodes look the same between the two approaches? Would the performance be the same?

To make this concrete, consider the following example, where I have a need to find all fish whose name contains the word "fish", and then capitalize the first letter of each matching fish. (Yes, I know that Hagfish isn't really a fish, but I ran out of matching fish names.)

List<String> fishList = Arrays.asList("catfish", "hagfish", "salmon", "tuna", "blowfish");

// Pre Java-8 solution
List<String> hasFishList = new ArrayList<String>();

for (String fish : fishList) {
    if (fish.contains("fish")) {
        String fishCap = fish.substring(0, 1).toUpperCase() + fish.substring(1); 
        hasFishList.add(fishCap);
    }
}

// Java-8 solution using streams
List<String> hasFishList = fishList.stream()
    .filter(f -> f.contains("fish"))
    .map(f -> f.substring(0, 1).toUpperCase() + f.substring(1))
    .collect(Collectors.toList());

Any insight you might have into how these two approaches might differ under the hood, at the Bytecode level, would be great. And some actual byte code would be even better.

like image 591
Tim Biegeleisen Avatar asked Aug 23 '15 11:08

Tim Biegeleisen


People also ask

How does Java 8 streams work internally?

Introduced in Java 8, the Stream API is used to process collections of objects. A stream is a sequence of objects that supports various methods which can be pipelined to produce the desired result. A stream is not a data structure instead it takes input from the Collections, Arrays or I/O channels.

What are the streams in Java 8?

Java 8 offers the possibility to create streams out of three primitive types: int, long and double. As Stream<T> is a generic interface, and there is no way to use primitives as a type parameter with generics, three new special interfaces were created: IntStream, LongStream, DoubleStream.

Why is Java 8 streaming so fast?

In Java8 Streams, performance is achieved by parallelism, laziness, and using short-circuit operations, but there is a downside as well, and we need to be very cautious while choosing Streams, as it may degrade the performance of your application. Let us look at these factors which are meant for Streams' performance.

Does Java 8 streams support functional programming?

However, Java 8 provides us many functional interfaces by default for different use cases under the package java. util. function. Many of these functional interfaces provide support for function composition in terms of default and static methods.


1 Answers

The answer has grown quite a bit over time so I will start with a summary:

Observations

  • Trace of what streams API really executes looks scary at first sight. Many calls and object creations. Notice, however, that the only part repeated for all elements in the collection is the body of the do-while loop. So apart from some constant overhead, the per-element overhead are ~6 virtual method calls (invokeinterface instructions - our 2 lambdas and 4 accept() calls on sinks).
  • The lambdas given to streams API calls are translated to a static method containing implementation and an invokedynamic instruction. Rather than creating a new object, it gives a prescription how to create the lambda at runtime. There is nothing special about calling the lambda methods on the created lambda objects afterwards (invokeinterface instruction).
  • You can observe how the stream is evaluated lazily. filter() and map() wrap their operations in anonymous subclasses of StatelessOp which in turn extend ReferencePipeline, AbstractPipeline and ultimately BaseStream. The actual evaluation is done when executing collect().
  • You can see how streams really use Spliterator rather then Iterator. Notice the many forks checking isParallel() - the parallel branches would leverage Spliterator's methods.
  • There are quite a few new objects created, at least 13. If you call such code in a loop, you may run into garbage collection problems. For a single execution it should be fine.
  • I would like to see a benchmark comparison of the two version. Streams version will probably be slower with difference from "Java 7 version" decreasing with increasing number of fish. See also a related SO question.

Trace Through Execution of Streams Usage in the Example

The pseudocode below captures a trace through the execution of the version using streams. See the bottom of this post for explanation how to read the trace.

Stream stream1 = fishList.stream();
    // Collection#stream():
    Spliterator spliterator = fishList.spliterator();
        return Spliterators.spliterator(fishList.a, 0);
            return new ArraySpliterator(fishList, 0);
    return StreamSupport.stream(spliterator, false)
        return new ReferencePipeline.Head(spliterator, StreamOpFlag.fromCharacteristics(spliterator), false)
Predicate fishPredicate = /* new lambda f -> f.contains("fish") */
Stream stream2 = stream1.filter(fishPredicate);
    return new StatelessOp(this, StreamShape.REFERENCE, StreamOpFlag.NOT_SIZED) { /* ... */ }
Function fishFunction = /* new lambda f.substring(0, 1).toUpperCase() + f.substring(1) */
Stream stream3 = stream2.map(fishFunction);
    return new StatelessOp(this, StreamShape.REFERENCE, StreamOpFlag.NOT_SORTED | StreamOpFlag.NOT_DISTINCT) { /* ... */ }
Collector collector = Collectors.toList();
    Supplier supplier = /* new lambda */
    BiConsumer accumulator = /* new lambda */
    BinaryOperator combiner = /* new lambda */
    return new CollectorImpl<>(supplier, accumulator, combiner, CH_ID);
List hasFishList = stream3.collect(collector)
    // ReferencePipeline#StatelessOp#collect(Collector):
    List container;
    if (stream3.isParallel() && /* not executed */) { /* not executed */ }
    else {
    /*>*/TerminalOp terminalOp = ReduceOps.makeRef(collector)
            Supplier supplier = Objects.requireNonNull(collector).supplier();
            BiConsumer accumulator = collector.accumulator();
            BinaryOperator combiner = collector.combiner();
            return new ReduceOp(StreamShape.REFERENCE) { /* ... */ }
    /*>*/container = stream3.evaluate(terminalOp);
            // AbstractPipeline#evaluate(TerminalOp):
            if (linkedOrConsumed) { /* not executed */ }
            linkedOrConsumed = true;
            if (isParallel()) { /* not executed */ }
            else {
            /*>*/Spliterator spliterator2 = sourceSpliterator(terminalOp.getOpFlags())
                    // AbstractPipeline#sourceSpliterator(int):
                    if (sourceStage.sourceSpliterator != null) { /* not executed */ }
                    /* ... */
                    if (isParallel()) { /* not executed */ }
                    return spliterator;
            /*>*/terminalOp.evaluateSequential(stream3, spliterator2);
                    // ReduceOps#ReduceOp#evaluateSequential(PipelineHelper, Spliterator):
                    ReducingSink sink = terminalOp.makeSink()
                        return new ReducingSink()
                    Sink sink = terminalOp.wrapAndCopyInto(sink, spliterator)
                        Sink wrappedSink = wrapSink(sink)
                            // AbstractPipeline#wrapSink(Sink)
                            for (/* executed twice */) { p.opWrapSink(p.previousStage.combinedFlags, sink) }
                                return new Sink.ChainedReference(sink)
                        terminalOp.copyInto(wrappedSink, spliterator);
                            // AbstractPipeline#copyInto()
                            if (!StreamOpFlag.SHORT_CIRCUIT.isKnown(getStreamAndOpFlags())) {
                            /*>*/wrappedSink.begin(spliterator.getExactSizeIfKnown());
                            /*>*/ /* not important */
                            /*>*/supplier.get() // initializes ArrayList
                            /*>*/spliterator.forEachRemaining(wrappedSink)
                                    // Spliterators#ArraySpliterator#foreachRemaining(Consumer):
                                    // ... unimportant code
!!                                  do {
                                    /*>*/action.accept((String)a[i])
                                    } while (++i < hi) // for each fish :)
                            /*>*/wrappedSink.end() // no-op
                            } else { /* not executed */}
                        return sink;
                    return sink.get()
            }
    /*>*/if (collector.characteristics().contains(Collector.Characteristics.IDENTITY_FINISH)) { return container; }
    /*>*/else { /* not executed */ }

The exclamation marks point to the actual workhorse: a do-while loop in fishList's Spliterator. Here is more detailed trace of the do-while loop:

do {
/*>*/action.accept((String)a[i])
    if (predicate.test(u)) { downstream.accept(u); }  // predicate is our fishPredicate
        downstream.accept(mapper.apply(u)); // mapper is our fishFunction
            accumulator.accept(u)
                // calls add(u) on resulting ArrayList
} while (++i < hi) // for each fish :)

Streams API with Lambdas on Bytecode Level

Let's look at how the relevant parts of the executed code look like in the bytecode. The interesting part is how

fishList.stream().filter(f -> f.contains("fish")).map(f -> f.substring(0, 1).toUpperCase() + f.ubstring(1)).collect(Collectors.toList());

is translated. You can find the full version on pastebin. I will focus only on
filter(f -> f.contains("fish")) here:

invokedynamic #26,  0         // InvokeDynamic #0:test:()Ljava/util/function/Predicate; [
    java/lang/invoke/LambdaMetafactory.metafactory(Ljava/lang/invoke/MethodHandles$Lookup;Ljava/lang/String;Ljava/lang/invoke/MethodType;Ljava/lang/invoke/MethodType;Ljava/lang/invoke/MethodHandle;Ljava/lang/invoke/MethodType;)Ljava/lang/invoke/CallSite;
    (Ljava/lang/Object;)Z, 
    FishTest.lambda$fish8$0(Ljava/lang/String;)Z, 
    (Ljava/lang/String;)Z
  ]
invokeinterface #27,  2       // InterfaceMethod java/util/stream/Stream.filter:(Ljava/util/function/Predicate;)Ljava/util/stream/Stream;
  

There is nothing specific to streams API there, but the new invokedynamic instruction is used to create lambdas. The Java 7 equivalent of lambdas would be creation of anonymous inner class implementing Predicate. This would be translated to bytecode as:

new FishTest$1                        // create new instance of Predicate
dup
invokespecial FishTest$1.<init>()V    // call constructor

Creating a lambda in Java 8 is translated as a single invokedynamic instruction instead, without creation of a new object. The purpose of invokedynamic instruction is to defer creation of lambda to runtime (as opposed to compile time). This enables features like caching lambda instances:

The use of invokedynamic lets us defer the selection of a translation strategy until run time. The runtime implementation is free to select a strategy dynamically to evaluate the lambda expression. ... The invokedynamic mechanics allow this to be done without the performance costs that this late binding approach might otherwise impose. ... For example, ... we generate the class the first time a given lambda factory site is called. Thereafter, future calls to that lambda factory site will re-use the class generated on the first call.

Arguments of invokedynamic give a "recipe" for constructing an instance of the respective functional interface. They represent the metafactory for runtime instance creation, reference to the method it implements (i.e. Predicate.test()) and implementation of the method.
In our case, the implementation is invocation of static method boolean lambda$fish8$0(String) which the compiler sneaks in to our class. It contains the actual bytecode for f.contains("fish"). If you used lambda capturing method references (e.g. list::add), captured variables from the outer scope, etc., things would get more complex - look for occurences of "indy" in this document for more info.

The other parts of bytecode are less interesting. The do-while loop, apart from the obvious looping, contains an invokeinterface instruction calling accept() on the respective Consumer. accept() call propagates down the sinks, calling our lambdas along the way. Nothing special here, both lambda calls, and propagation through sinks are simple invokeinterface instructions.


How to read the pseudocode

Indentation is used to show the unfolded body of call above the indented code. Code beggining with /*>*/ represents continuation of the current call (when needed for better readability). Therefore call

Objects.requireNonNull(new Object());

would be written in the trace pseudocode as:

Object o = new Object(); // extracted variable to improve visibility of new instance creation
Objects.requireNonNull(o);
    // this is the body of Objects.requireNonNull():
    if (o == null) {
    /*>*/throw new NullPointerException(); // this line is still part of  requireNonNull() body
    }
    return o;

I also skipped some unimportant calls like null checks, omitted generic parameters, extracted inlined expressions to variables where appropriate, etc., to improve readability.

like image 121
Mifeet Avatar answered Oct 02 '22 10:10

Mifeet