I'm trying to understand why the following Java program gives an <code>OutOfMemoryError</code>, while the corresponding program without <code>.parallel()</code> doesn't. <pre class="prettyprint lang-java prettyprint-override"><code>System.out.println(Stream .iterate(1, i -> i+1) .parallel() .flatMap(n -> Stream.iterate(n, i -> i+n)) .mapToInt(Integer::intValue) .limit(100_000_000) .sum() ); </code></pre> I have two questions: <ol> <li> What is the intended output of this program? Without <code>.parallel()</code> it seems that this simply outputs <code>sum(1+2+3+...)</code> which means that it simply "gets stuck" at the first stream in the flatMap, which makes sense. With parallel I don't know if there is an expected behaviour, but my guess would be that it somehow interleaved the first <code>n</code> or so streams, where <code>n</code> is the number of parallel workers. It could also be slightly different based on the chunking/buffering behaviour. </li> <li> What causes it to run out of memory? I'm specifically trying to understand how these streams are implemented under the hood. I'm guessing something blocks the stream, so it never finishes and is able to get rid of the generated values, but I don't quite know in which order things are evaluated and where buffering occurs. </li> </ol> Edit: In case it is relevant, I'm using Java 11. Editt 2: Apparently the same thing happens even for the simple program <code>IntStream.iterate(1,i->i+1).limit(1000_000_000).parallel().sum()</code>, so it might have to do with the lazyness of <code>limit</code> rather than <code>flatMap</code>.

My best guess is that adding <code>parallel()</code> changes the internal behavior of <code>flatMap()</code> which already had problems being evaluated lazily before. The <code>OutOfMemoryError</code> error that you are getting was reported in [JDK-8202307] Getting a java.lang.OutOfMemoryError: Java heap space when calling Stream.iterator().next() on a stream which uses an infinite/very big Stream in flatMap. If you look at the ticket it's more or less the same stack trace that you are getting. The ticket was closed as Won't Fix with following reason: <blockquote> The <code>iterator()</code> and <code>spliterator()</code> methods are "escape hatches" to be used when it's not possible to use other operations. They have some limitations because they turn what is a push model of the stream implementation into a pull model. Such a transition requires buffering in certain cases, such as when an element is (flat) mapped to two or more elements. It would significantly complicate the stream implementation, likely at the expense of common cases, to support a notion of back-pressure to communicate how many elements to pull through nested layers of element production. </blockquote>

Parallel Infinite Java Streams run out of Memory

Tags:

java

out-of-memory

java-stream

lazy-evaluation

I'm trying to understand why the following Java program gives an OutOfMemoryError, while the corresponding program without .parallel() doesn't.

System.out.println(Stream
    .iterate(1, i -> i+1)
    .parallel()
    .flatMap(n -> Stream.iterate(n, i -> i+n))
    .mapToInt(Integer::intValue)
    .limit(100_000_000)
    .sum()
);

I have two questions:

What is the intended output of this program?

Without .parallel() it seems that this simply outputs sum(1+2+3+...) which means that it simply "gets stuck" at the first stream in the flatMap, which makes sense.

With parallel I don't know if there is an expected behaviour, but my guess would be that it somehow interleaved the first n or so streams, where n is the number of parallel workers. It could also be slightly different based on the chunking/buffering behaviour.
What causes it to run out of memory? I'm specifically trying to understand how these streams are implemented under the hood.

I'm guessing something blocks the stream, so it never finishes and is able to get rid of the generated values, but I don't quite know in which order things are evaluated and where buffering occurs.

Edit: In case it is relevant, I'm using Java 11.

Editt 2: Apparently the same thing happens even for the simple program IntStream.iterate(1,i->i+1).limit(1000_000_000).parallel().sum(), so it might have to do with the lazyness of limit rather than flatMap.

452

asked Jan 31 '20 10:01

Thomas Ahle

2 Answers

You say “but I don't quite know in which order things are evaluated and where buffering occurs”, which is precisely what parallel streams are about. The order of evaluation is unspecified.

A critical aspect of your example is the .limit(100_000_000). This implies that the implementation can’t just sum up arbitrary values, but must sum up the first 100,000,000 numbers. Note that in the reference implementation, .unordered().limit(100_000_000) doesn’t change the outcome, which indicates that there’s no special implementation for the unordered case, but that’s an implementation detail.

Now, when worker threads process the elements, they can’t just sum them up, as they have to know which elements they are allowed to consume, which depends on how many elements are preceding their specific workload. Since this stream doesn’t know the sizes, this can only be known when the prefix elements have been processed, which never happens for infinite streams. So the worker threads keep buffering for the moment, this information becomes available.

In principle, when a worker thread knows that it processes the leftmost¹ work-chunk, it could sum up the elements immediately, count them, and signal the end when reaching the limit. So the Stream could terminate, but this depends on a lot of factors.

In your case, a plausible scenario is that the other worker threads are faster in allocating buffers than the leftmost job is counting. In this scenario, subtle changes to the timing could make the stream occasionally return with a value.

When we slow down all worker threads except the one processing the leftmost chunk, we can make the stream terminate (at least in most runs):

System.out.println(IntStream
    .iterate(1, i -> i+1)
    .parallel()
    .peek(i -> { if(i != 1) LockSupport.parkNanos(1_000_000_000); })
    .flatMap(n -> IntStream.iterate(n, i -> i+n))
    .limit(100_000_000)
    .sum()
);

¹ I’m following a suggestion by Stuart Marks to use left-to-right order when talking about the encounter order rather than the processing order.

answered Oct 09 '22 15:10

Holger

My best guess is that adding parallel() changes the internal behavior of flatMap() which already had problems being evaluated lazily before.

The OutOfMemoryError error that you are getting was reported in [JDK-8202307] Getting a java.lang.OutOfMemoryError: Java heap space when calling Stream.iterator().next() on a stream which uses an infinite/very big Stream in flatMap. If you look at the ticket it's more or less the same stack trace that you are getting. The ticket was closed as Won't Fix with following reason:

The iterator() and spliterator() methods are "escape hatches" to be used when it's not possible to use other operations. They have some limitations because they turn what is a push model of the stream implementation into a pull model. Such a transition requires buffering in certain cases, such as when an element is (flat) mapped to two or more elements. It would significantly complicate the stream implementation, likely at the expense of common cases, to support a notion of back-pressure to communicate how many elements to pull through nested layers of element production.

answered Oct 09 '22 14:10

Karol Dowbecki

Related questions
                            
                                Crashlytics not finding API Key in crashlytics.properties at runtime
                            
                                Reference to methods with different parameters in Java8
                            
                                How do I compare each character of a String while accounting for characters with length > 1?
                            
                                In java, What does such enum type compile to?
                            
                                Can websocket messages get lost or not?
                            
                                issue with java 8 collectors Type mismatch: cannot convert from List<Object> to List<String>
                            
                                Picasso Image downloaded again for an ImageView with different dimensions?
                            
                                Problems understanding lower bounds when used with lambda and Functional Interface
                            
                                How can I mock db connection in Spring Boot for testing purpose?
                            
                                Is there any instruction reordering done by the Hotspot JIT compiler that can be reproduced?
                            
                                How to get the pure Json string from DynamoDB stream new image?
                            
                                How can i create dynamically tab with viewpager in android?
                            
                                How Lombok generates code onto existing class? [duplicate]
                            
                                How to mock constructor with PowerMockito
                            
                                What does it mean <S extends T> save (S entity); in Spring Repository?
                            
                                Is it bad practice to use default common fork/join pool with CompletableFuture for doing long blocking calls?
                            
                                Can't seem to understand complex polymorphism
                            
                                Generic parameter: only diamond operator seems to work
                            
                                How to reinitialize a Spring Bean?
                            
                                How to override Lombok Setter methods

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With