Understanding Java 8 Streams at the Bytecode Level

Tags:

There is a wealth of information and tutorials online regarding streams in Java 8. Most of what I have found does a good job of explaining the how the various elements of a stream work on a conceptual level. However, I have not encountered much material which describes how streams are actually implemented and executed by the JVM under the hood.

Consider comparing an operation on a Collection between using a stream and doing it the old school pre-Java 8 way. Would the underlying Bytecodes look the same between the two approaches? Would the performance be the same?

To make this concrete, consider the following example, where I have a need to find all fish whose name contains the word "fish", and then capitalize the first letter of each matching fish. (Yes, I know that Hagfish isn't really a fish, but I ran out of matching fish names.)

List<String> fishList = Arrays.asList("catfish", "hagfish", "salmon", "tuna", "blowfish");

// Pre Java-8 solution
List<String> hasFishList = new ArrayList<String>();

for (String fish : fishList) {
    if (fish.contains("fish")) {
        String fishCap = fish.substring(0, 1).toUpperCase() + fish.substring(1); 
        hasFishList.add(fishCap);
    }
}

// Java-8 solution using streams
List<String> hasFishList = fishList.stream()
    .filter(f -> f.contains("fish"))
    .map(f -> f.substring(0, 1).toUpperCase() + f.substring(1))
    .collect(Collectors.toList());

Any insight you might have into how these two approaches might differ under the hood, at the Bytecode level, would be great. And some actual byte code would be even better.

591

asked Aug 23 '15 11:08

Tim Biegeleisen

1 Answers

The answer has grown quite a bit over time so I will start with a summary:

Observations

Trace of what streams API really executes looks scary at first sight. Many calls and object creations. Notice, however, that the only part repeated for all elements in the collection is the body of the do-while loop. So apart from some constant overhead, the per-element overhead are ~6 virtual method calls (invokeinterface instructions - our 2 lambdas and 4 accept() calls on sinks).
The lambdas given to streams API calls are translated to a static method containing implementation and an invokedynamic instruction. Rather than creating a new object, it gives a prescription how to create the lambda at runtime. There is nothing special about calling the lambda methods on the created lambda objects afterwards (invokeinterface instruction).
You can observe how the stream is evaluated lazily. filter() and map() wrap their operations in anonymous subclasses of StatelessOp which in turn extend ReferencePipeline, AbstractPipeline and ultimately BaseStream. The actual evaluation is done when executing collect().
You can see how streams really use Spliterator rather then Iterator. Notice the many forks checking isParallel() - the parallel branches would leverage Spliterator's methods.
There are quite a few new objects created, at least 13. If you call such code in a loop, you may run into garbage collection problems. For a single execution it should be fine.
_{I would like to see a benchmark comparison of the two version. Streams version will probably be slower with difference from "Java 7 version" decreasing with increasing number of fish. See also a related SO question.}

Trace Through Execution of Streams Usage in the Example

The pseudocode below captures a trace through the execution of the version using streams. See the bottom of this post for explanation how to read the trace.

Stream stream1 = fishList.stream();
    // Collection#stream():
    Spliterator spliterator = fishList.spliterator();
        return Spliterators.spliterator(fishList.a, 0);
            return new ArraySpliterator(fishList, 0);
    return StreamSupport.stream(spliterator, false)
        return new ReferencePipeline.Head(spliterator, StreamOpFlag.fromCharacteristics(spliterator), false)
Predicate fishPredicate = /* new lambda f -> f.contains("fish") */
Stream stream2 = stream1.filter(fishPredicate);
    return new StatelessOp(this, StreamShape.REFERENCE, StreamOpFlag.NOT_SIZED) { /* ... */ }
Function fishFunction = /* new lambda f.substring(0, 1).toUpperCase() + f.substring(1) */
Stream stream3 = stream2.map(fishFunction);
    return new StatelessOp(this, StreamShape.REFERENCE, StreamOpFlag.NOT_SORTED | StreamOpFlag.NOT_DISTINCT) { /* ... */ }
Collector collector = Collectors.toList();
    Supplier supplier = /* new lambda */
    BiConsumer accumulator = /* new lambda */
    BinaryOperator combiner = /* new lambda */
    return new CollectorImpl<>(supplier, accumulator, combiner, CH_ID);
List hasFishList = stream3.collect(collector)
    // ReferencePipeline#StatelessOp#collect(Collector):
    List container;
    if (stream3.isParallel() && /* not executed */) { /* not executed */ }
    else {
    /*>*/TerminalOp terminalOp = ReduceOps.makeRef(collector)
            Supplier supplier = Objects.requireNonNull(collector).supplier();
            BiConsumer accumulator = collector.accumulator();
            BinaryOperator combiner = collector.combiner();
            return new ReduceOp(StreamShape.REFERENCE) { /* ... */ }
    /*>*/container = stream3.evaluate(terminalOp);
            // AbstractPipeline#evaluate(TerminalOp):
            if (linkedOrConsumed) { /* not executed */ }
            linkedOrConsumed = true;
            if (isParallel()) { /* not executed */ }
            else {
            /*>*/Spliterator spliterator2 = sourceSpliterator(terminalOp.getOpFlags())
                    // AbstractPipeline#sourceSpliterator(int):
                    if (sourceStage.sourceSpliterator != null) { /* not executed */ }
                    /* ... */
                    if (isParallel()) { /* not executed */ }
                    return spliterator;
            /*>*/terminalOp.evaluateSequential(stream3, spliterator2);
                    // ReduceOps#ReduceOp#evaluateSequential(PipelineHelper, Spliterator):
                    ReducingSink sink = terminalOp.makeSink()
                        return new ReducingSink()
                    Sink sink = terminalOp.wrapAndCopyInto(sink, spliterator)
                        Sink wrappedSink = wrapSink(sink)
                            // AbstractPipeline#wrapSink(Sink)
                            for (/* executed twice */) { p.opWrapSink(p.previousStage.combinedFlags, sink) }
                                return new Sink.ChainedReference(sink)
                        terminalOp.copyInto(wrappedSink, spliterator);
                            // AbstractPipeline#copyInto()
                            if (!StreamOpFlag.SHORT_CIRCUIT.isKnown(getStreamAndOpFlags())) {
                            /*>*/wrappedSink.begin(spliterator.getExactSizeIfKnown());
                            /*>*/ /* not important */
                            /*>*/supplier.get() // initializes ArrayList
                            /*>*/spliterator.forEachRemaining(wrappedSink)
                                    // Spliterators#ArraySpliterator#foreachRemaining(Consumer):
                                    // ... unimportant code
!!                                  do {
                                    /*>*/action.accept((String)a[i])
                                    } while (++i < hi) // for each fish :)
                            /*>*/wrappedSink.end() // no-op
                            } else { /* not executed */}
                        return sink;
                    return sink.get()
            }
    /*>*/if (collector.characteristics().contains(Collector.Characteristics.IDENTITY_FINISH)) { return container; }
    /*>*/else { /* not executed */ }

The exclamation marks point to the actual workhorse: a do-while loop in fishList's Spliterator. Here is more detailed trace of the do-while loop:

do {
/*>*/action.accept((String)a[i])
    if (predicate.test(u)) { downstream.accept(u); }  // predicate is our fishPredicate
        downstream.accept(mapper.apply(u)); // mapper is our fishFunction
            accumulator.accept(u)
                // calls add(u) on resulting ArrayList
} while (++i < hi) // for each fish :)

Streams API with Lambdas on Bytecode Level

Let's look at how the relevant parts of the executed code look like in the bytecode. The interesting part is how

fishList.stream().filter(f -> f.contains("fish")).map(f -> f.substring(0, 1).toUpperCase() + f.ubstring(1)).collect(Collectors.toList());

is translated. You can find the full version on pastebin. I will focus only on
filter(f -> f.contains("fish")) here:

invokedynamic #26,  0         // InvokeDynamic #0:test:()Ljava/util/function/Predicate; [
    java/lang/invoke/LambdaMetafactory.metafactory(Ljava/lang/invoke/MethodHandles$Lookup;Ljava/lang/String;Ljava/lang/invoke/MethodType;Ljava/lang/invoke/MethodType;Ljava/lang/invoke/MethodHandle;Ljava/lang/invoke/MethodType;)Ljava/lang/invoke/CallSite;
    (Ljava/lang/Object;)Z, 
    FishTest.lambda$fish8$0(Ljava/lang/String;)Z, 
    (Ljava/lang/String;)Z
  ]
invokeinterface #27,  2       // InterfaceMethod java/util/stream/Stream.filter:(Ljava/util/function/Predicate;)Ljava/util/stream/Stream;

There is nothing specific to streams API there, but the new invokedynamic instruction is used to create lambdas. The Java 7 equivalent of lambdas would be creation of anonymous inner class implementing Predicate. This would be translated to bytecode as:

new FishTest$1                        // create new instance of Predicate
dup
invokespecial FishTest$1.<init>()V    // call constructor

Creating a lambda in Java 8 is translated as a single invokedynamic instruction instead, without creation of a new object. The purpose of invokedynamic instruction is to defer creation of lambda to runtime (as opposed to compile time). This enables features like caching lambda instances:

The use of invokedynamic lets us defer the selection of a translation strategy until run time. The runtime implementation is free to select a strategy dynamically to evaluate the lambda expression. ... The invokedynamic mechanics allow this to be done without the performance costs that this late binding approach might otherwise impose. ... For example, ... we generate the class the first time a given lambda factory site is called. Thereafter, future calls to that lambda factory site will re-use the class generated on the first call.

Arguments of invokedynamic give a "recipe" for constructing an instance of the respective functional interface. They represent the metafactory for runtime instance creation, reference to the method it implements (i.e. Predicate.test()) and implementation of the method.
In our case, the implementation is invocation of static method boolean lambda$fish8$0(String) which the compiler sneaks in to our class. It contains the actual bytecode for f.contains("fish"). If you used lambda capturing method references (e.g. list::add), captured variables from the outer scope, etc., things would get more complex - look for occurences of "indy" in this document for more info.

The other parts of bytecode are less interesting. The do-while loop, apart from the obvious looping, contains an invokeinterface instruction calling accept() on the respective Consumer. accept() call propagates down the sinks, calling our lambdas along the way. Nothing special here, both lambda calls, and propagation through sinks are simple invokeinterface instructions.

How to read the pseudocode

_{Indentation is used to show the unfolded body of call above the indented code. Code beggining with /*>*/ represents continuation of the current call (when needed for better readability). Therefore call}

Objects.requireNonNull(new Object());

_{would be written in the trace pseudocode as:}

Object o = new Object(); // extracted variable to improve visibility of new instance creation
Objects.requireNonNull(o);
    // this is the body of Objects.requireNonNull():
    if (o == null) {
    /*>*/throw new NullPointerException(); // this line is still part of  requireNonNull() body
    }
    return o;

_{I also skipped some unimportant calls like null checks, omitted generic parameters, extracted inlined expressions to variables where appropriate, etc., to improve readability.}

121

answered Oct 02 '22 10:10

Mifeet

Related questions
                            
                                JAX-RS: How to secure REST endpoints?
                            
                                Encrypt in java and Decrypt in C# For AES 256 bit
                            
                                How to design push notifications using Atmosphere
                            
                                Event fired only once when watching a directory
                            
                                Create mutli-page document dynamically using PDFBox
                            
                                Using JDK 7 Or Higher With Android Studio And Eclipse On Mac OSX
                            
                                Can Intellij Idea be configured to break when a JUnit AssertionError occurs?
                            
                                How to get byte[] from javafx imageView?
                            
                                How to use synchronized blocks across classes?
                            
                                Exclude specific tests from being run in parallel in jUnit
                            
                                Why ternary operation gives nullpointer while its ifelse counterpart doesn't? [duplicate]
                            
                                Java8 dynamic proxy and default methods
                            
                                Contains method for Iterable and Iterator?
                            
                                How can I mock a void method to throw an exception?
                            
                                JsonParseException: Unrecognized token 'http': was expecting ('true', 'false' or 'null')
                            
                                JavaFX - How to delete a specific Node from an AnchorPane
                            
                                Regular expression ^\\Q & \\E [duplicate]
                            
                                Checking doubles for equality and Sonar issues
                            
                                Why are the results of of str == str.intern() for these strings different?
                            
                                Boxed value unboxed then reboxed

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Understanding Java 8 Streams at the Bytecode Level

Tags:

java

lambda

java-8

java-stream

bytecode