Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java 8 stream objects significant memory usage

Tags:

In looking at some profiling results, I noticed that using streams within a tight loop (used instead of another nested loop) incurred a significant memory overhead of objects of types java.util.stream.ReferencePipeline and java.util.ArrayList$ArrayListSpliterator. I converted the offending streams to foreach loops, and the memory consumption decreased significantly.

I know that streams make no promises about performing any better than ordinary loops, but I was under the impression that the difference would be negligible. In this case it seemed like it was a 40% increase.

Here is the test class I wrote to isolate the problem. I monitored memory consumption and object allocation with JFR:

import java.util.ArrayList; import java.util.List; import java.util.Optional; import java.util.Random; import java.util.function.Predicate;  public class StreamMemoryTest {      private static boolean blackHole = false;      public static List<Integer> getRandListOfSize(int size) {         ArrayList<Integer> randList = new ArrayList<>(size);         Random rnGen = new Random();         for (int i = 0; i < size; i++) {             randList.add(rnGen.nextInt(100));         }         return randList;     }      public static boolean getIndexOfNothingManualImpl(List<Integer> nums, Predicate<Integer> predicate) {          for (Integer num : nums) {             // Impossible condition             if (predicate.test(num)) {                 return true;             }         }         return false;     }      public static boolean getIndexOfNothingStreamImpl(List<Integer> nums, Predicate<Integer> predicate) {         Optional<Integer> first = nums.stream().filter(predicate).findFirst();         return first.isPresent();     }      public static void consume(boolean value) {         blackHole = blackHole && value;     }      public static boolean result() {         return blackHole;     }      public static void main(String[] args) {         // 100 million trials         int numTrials = 100000000;         System.out.println("Beginning test");         for (int i = 0; i < numTrials; i++) {             List<Integer> randomNums = StreamMemoryTest.getRandListOfSize(100);             consume(StreamMemoryTest.getIndexOfNothingStreamImpl(randomNums, x -> x < 0));             // or ...             // consume(StreamMemoryTest.getIndexOfNothingManualImpl(randomNums, x -> x < 0));             if (randomNums == null) {                 break;             }         }         System.out.print(StreamMemoryTest.result());     } } 

Stream implementation:

Memory Allocated for TLABs 64.62 GB

Class   Average Object Size(bytes)  Total Object Size(bytes)    TLABs   Average TLAB Size(bytes)    Total TLAB Size(bytes)  Pressure(%) java.lang.Object[]                          415.974 6,226,712   14,969  2,999,696.432   44,902,455,888  64.711 java.util.stream.ReferencePipeline$2        64      131,264     2,051   2,902,510.795   5,953,049,640   8.579 java.util.stream.ReferencePipeline$Head     56      72,744      1,299   3,070,768.043   3,988,927,688   5.749 java.util.stream.ReferencePipeline$2$1      24      25,128      1,047   3,195,726.449   3,345,925,592   4.822 java.util.Random                            32      30,976      968     3,041,212.372   2,943,893,576   4.243 java.util.ArrayList                         24      24,576      1,024   2,720,615.594   2,785,910,368   4.015 java.util.stream.FindOps$FindSink$OfRef     24      18,864      786     3,369,412.295   2,648,358,064   3.817 java.util.ArrayList$ArrayListSpliterator    32      14,720      460     3,080,696.209   1,417,120,256   2.042 

Manual implementation:

Memory Allocated for TLABs 46.06 GB

Class   Average Object Size(bytes)  Total Object Size(bytes)    TLABs   Average TLAB Size(bytes)    Total TLAB Size(bytes)  Pressure(%) java.lang.Object[]      415.961     4,190,392       10,074      4,042,267.769       40,721,805,504  82.33 java.util.Random        32          32,064          1,002       4,367,131.521       4,375,865,784   8.847 java.util.ArrayList     24          14,976          624         3,530,601.038       2,203,095,048   4.454 

Has anyone else encountered issues with the stream objects themselves consuming memory? / Is this a known issue?

like image 565
Bryan J Avatar asked Dec 19 '16 20:12

Bryan J


People also ask

Does Java stream save memory?

No storage. Streams don't have storage for values; they carry values from a source (which could be a data structure, a generating function, an I/O channel, etc) through a pipeline of computational steps.

Are Java 8 streams lazy?

The Java 8 Streams API is fully based on the 'process only on demand' strategy and hence supports laziness. In the Java 8 Streams API, the intermediate operations are lazy and their internal processing model is optimised to make it being capable of processing the large amount of data with high performance.

Is Java 8 stream faster than for loop?

Yes, streams are sometimes slower than loops, but they can also be equally fast; it depends on the circumstances. The point to take home is that sequential streams are no faster than loops.

Why is Java 8 streaming so fast?

In Java8 Streams, performance is achieved by parallelism, laziness, and using short-circuit operations, but there is a downside as well, and we need to be very cautious while choosing Streams, as it may degrade the performance of your application. Let us look at these factors which are meant for Streams' performance.


Video Answer


2 Answers

Using Stream API you indeed allocate more memory, though your experimental setup is somewhat questionable. I've never used JFR, but my findings using JOL are quite similar to yours.

Note that you measure not only the heap allocated during the ArrayList querying, but also during its creation and population. The allocation during the allocation and population of single ArrayList should look like this (64bits, compressed OOPs, via JOL):

 COUNT       AVG       SUM   DESCRIPTION      1       416       416   [Ljava.lang.Object;      1        24        24   java.util.ArrayList      1        32        32   java.util.Random      1        24        24   java.util.concurrent.atomic.AtomicLong      4                 496   (total) 

So the most memory allocated is the Object[] array used inside ArrayList to store the data. AtomicLong is a part of Random class implementation. If you perform this 100_000_000 times, then you should have at least 496*10^8/2^30 = 46.2 Gb allocated in both tests. Nevertheless this part could be skipped as it should be identical for both tests.

Another interesting thing here is inlining. JIT is smart enough to inline the whole getIndexOfNothingManualImpl (via java -XX:+UnlockDiagnosticVMOptions -XX:+PrintCompilation -XX:+PrintInlining StreamMemoryTest):

  StreamMemoryTest::main @ 13 (59 bytes)      ...      @ 30   StreamMemoryTest::getIndexOfNothingManualImpl (43 bytes)   inline (hot)        @ 1   java.util.ArrayList::iterator (10 bytes)   inline (hot)         \-> TypeProfile (2132/2132 counts) = java/util/ArrayList          @ 6   java.util.ArrayList$Itr::<init> (6 bytes)   inline (hot)            @ 2   java.util.ArrayList$Itr::<init> (26 bytes)   inline (hot)              @ 6   java.lang.Object::<init> (1 bytes)   inline (hot)        @ 8   java.util.ArrayList$Itr::hasNext (20 bytes)   inline (hot)         \-> TypeProfile (215332/215332 counts) = java/util/ArrayList$Itr          @ 8   java.util.ArrayList::access$100 (5 bytes)   accessor        @ 17   java.util.ArrayList$Itr::next (66 bytes)   inline (hot)          @ 1   java.util.ArrayList$Itr::checkForComodification (23 bytes)   inline (hot)          @ 14   java.util.ArrayList::access$100 (5 bytes)   accessor        @ 28   StreamMemoryTest$$Lambda$1/791452441::test (8 bytes)   inline (hot)         \-> TypeProfile (213200/213200 counts) = StreamMemoryTest$$Lambda$1          @ 4   StreamMemoryTest::lambda$main$0 (13 bytes)   inline (hot)            @ 1   java.lang.Integer::intValue (5 bytes)   accessor        @ 8   java.util.ArrayList$Itr::hasNext (20 bytes)   inline (hot)          @ 8   java.util.ArrayList::access$100 (5 bytes)   accessor      @ 33   StreamMemoryTest::consume (19 bytes)   inline (hot) 

Disassembly actually shows that no allocation of iterator is performed after warm-up. Because escape analysis successfully tells JIT that iterator object does not escape, it's simply scalarized. Were the Iterator actually allocated it would take additionally 32 bytes:

 COUNT       AVG       SUM   DESCRIPTION      1        32        32   java.util.ArrayList$Itr      1                  32   (total) 

Note that JIT could also remove iteration at all. Your blackhole is false by default, so doing blackhole = blackhole && value does not change it regardless of the value, and value calculation could be excluded at all, as it does not have any side effects. I'm not sure whether it actually did this (reading disassembly is quite hard for me), but it's possible.

However while getIndexOfNothingStreamImpl also seems to inline everything inside, escape analysis fails as there are too many interdependent objects inside the stream API, so actual allocations occur. Thus it really adds five additional objects (the table is composed manually from JOL outputs):

 COUNT       AVG       SUM   DESCRIPTION      1        32        32   java.util.ArrayList$ArrayListSpliterator      1        24        24   java.util.stream.FindOps$FindSink$OfRef      1        64        64   java.util.stream.ReferencePipeline$2      1        24        24   java.util.stream.ReferencePipeline$2$1      1        56        56   java.util.stream.ReferencePipeline$Head      5                 200   (total) 

So every invocation of this particular stream actually allocates 200 additional bytes. As you perform 100_000_000 iterations, in total Stream version should allocate 10^8*200/2^30 = 18.62Gb more than manual version which is close to your result. I think, AtomicLong inside Random is scalarized as well, but both Iterator and AtomicLong are present during the warmup iterations (until JIT actually creates the most optimized version). This would explain the minor discrepancies in the numbers.

This additional 200 bytes allocation does not depend on the stream size, but depends on the number of intermediate stream operations (in particular, every additional filter step would add 64+24=88 bytes more). However note that these objects are usually short-lived, allocated quickly and can be collected by minor GC. In most of real-life applications you probably should not have to worry about this.

like image 127
Tagir Valeev Avatar answered Nov 08 '22 01:11

Tagir Valeev


Not only more memory due to the infrastructure that is needed to build the Stream API. But also, it might to be slower in terms of speed (at least for this small inputs).

There is this presentation from one of the developers from Oracle (it is in russian, but that is not the point) that shows a trivial example (not much more complicated then yours) where the speed of execution is 30% worse in case of Streams vs Loops. He says that's pretty normal.

One thing that I've notice that not a lot of people realize is that using Streams (lambda's and method references to be more precise) will also create (potentially) a lot of classes that you will not know about.

Try to run your example with :

  -Djdk.internal.lambda.dumpProxyClasses=/Some/Path/Of/Yours 

And see how many additional classes will be created by your code and the code that Streams need (via ASM)

like image 37
Eugene Avatar answered Nov 08 '22 02:11

Eugene