Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does Collection.parallelStream() imply a happens-before relationship?

Consider this (completely contrived) Java code:

final List<Integer> s = Arrays.asList(1, 2, 3);
final int[] a = new int[1];
a[0] = 100;
s.parallelStream().forEach(i -> {
    synchronized (a) {
        a[0] += i;
    }
});
System.out.println(a[0]);

Is this code guaranteed to output "106"?

It seems like it is not unless there is a happens-before relationship established by parallelStream(), by which we can know for sure that the first accesses to a[0] in the lambda will see 100 and not zero (according to my understanding of the Java memory model).

But Collection.parallelStream() is not documented to establish such a relationship...

The same question can be asked for the completion of the parallelStream() method invocation.

So am I missing something, or is it true that for correctness would the above code be required to look something like this instead:

final List<Integer> s = Arrays.asList(1, 2, 3);
final int[] a = new int[1];
synchronized (a) {
    a[0] = 100;
}
s.parallelStream().forEach(i -> {
    synchronized (a) {
        a[0] += i;
    }
});
synchronized (a) {
    System.out.println(a[0]);
}

Or... does parallelStream() actually provide these happens-before relationships, and this simply a matter of some missing documentation?

I'm asking because from an API design perspective, it seems (to me at least) like this would be a logical thing to do... analogous to Thread.start(), etc.

like image 766
Archie Avatar asked Dec 23 '18 18:12

Archie


People also ask

What is parallelStream?

The parallelStream() method of the Collection interface can be used to create a parallel stream with a collection as the datasource. No other code is necessary for parallel execution, as the data partitioning and thread management for a parallel stream are handled by the API and the JVM.

Which method defined by collection is used to obtain a parallel stream?

To create a parallel stream from another stream, use the parallel() method. To create a parallel stream from a Collection use the parallelStream() method.

How many threads does parallelStream use?

In case of Parallel stream,4 threads are spawned simultaneously and it internally using Fork and Join pool to create and manage threads. Parallel streams create ForkJoinPool instance via static ForkJoinPool.

Is stream parallel by default?

parallelStream() streams are always parallel... And then there is Spliterator , and in particular its . characteristics() , one of them being that it can be CONCURRENT , or even IMMUTABLE .


1 Answers

You really should avoid hitting variables 'outside' the pipeline. Even if you get it to work correctly performance will likely suffer. There are a lot of tools to achieve this built into the JDK. For example your use case is probably safer with something like:

Integer reduce = IntStream.of(1, 2, 3)
            .parallel()
            .reduce(100, (accumulator, element) -> accumulator + element);
like image 73
Adam Bickford Avatar answered Sep 24 '22 06:09

Adam Bickford