Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is this a bug in Files.lines(), or am I misunderstanding something about parallel streams?

Environment: Ubuntu x86_64 (14.10), Oracle JDK 1.8u25

I try and use a parallel stream of Files.lines() but I want to .skip() the first line (it's a CSV file with a header). Therefore I try and do this:

try (     final Stream<String> stream = Files.lines(thePath, StandardCharsets.UTF_8)         .skip(1L).parallel(); ) {     // etc } 

But then one column failed to parse to an int...

So I tried some simple code. The file is question is dead simple:

$ cat info.csv  startDate;treeDepth;nrMatchers;nrLines;nrChars;nrCodePoints;nrNodes 1422758875023;34;54;151;4375;4375;27486 $ 

And the code is equally simple:

public static void main(final String... args) {     final Path path = Paths.get("/home/fge/tmp/dd/info.csv");     Files.lines(path, StandardCharsets.UTF_8).skip(1L).parallel()         .forEach(System.out::println); } 

And I systematically get the following result (OK, I have only run it something around 20 times):

startDate;treeDepth;nrMatchers;nrLines;nrChars;nrCodePoints;nrNodes 

What am I missing here?


EDIT It seems like the problem, or misunderstanding, is much more rooted than that (the two examples below were cooked up by a fellow on FreeNode's ##java):

public static void main(final String... args) {     new BufferedReader(new StringReader("Hello\nWorld")).lines()         .skip(1L).parallel()         .forEach(System.out::println);      final Iterator<String> iter         = Arrays.asList("Hello", "World").iterator();     final Spliterator<String> spliterator         = Spliterators.spliteratorUnknownSize(iter, Spliterator.ORDERED);     final Stream<String> s         = StreamSupport.stream(spliterator, true);      s.skip(1L).forEach(System.out::println); } 

This prints:

Hello Hello 

Uh.

@Holger suggested that this happens for any stream which is ORDERED and not SIZED with this other sample:

Stream.of("Hello", "World")     .filter(x -> true)     .parallel()     .skip(1L)     .forEach(System.out::println); 

Also, it stems from all the discussion which already took place that the problem (if it is one?) is with .forEach() (as @SotiriosDelimanolis first pointed out).

like image 291
fge Avatar asked Feb 01 '15 04:02

fge


People also ask

When we should not use parallel stream?

Similarly, don't use parallel if the stream is ordered and has much more elements than you want to process, e.g. This may run much longer because the parallel threads may work on plenty of number ranges instead of the crucial one 0-100, causing this to take very long time.

What is the difference between stream () and parallelStream ()?

stream() works in sequence on a single thread with the println() operation. list. parallelStream(), on the other hand, is processed in parallel, taking full advantage of the underlying multicore environment. The interesting aspect is in the output of the preceding program.

Are parallel streams thread safe?

Parallel streams provide the capability of parallel processing over collections that are not thread-safe. It is although required that one does not modify the collection during the parallel processing.

What is Parallel Streaming?

When a stream executes in parallel, the Java runtime partitions the stream into multiple substreams. Aggregate operations iterate over and process these substreams in parallel and then combine the results. When you create a stream, it is always a serial stream unless otherwise specified.


1 Answers

Since the current state of the issue is quite the opposite of the earlier statements made here, it should be noted, that there is now an explicit statement by Brian Goetz about the back-propagation of the unordered characteristic past a skip operation is considered a bug. It’s also stated that it is now considered to have no back-propagation of the ordered-ness of a terminal operation at all.

There is also a related bug report, JDK-8129120 whose status is “fixed in Java 9” and it’s backported to Java 8, update 60

I did some tests with jdk1.8.0_60 and it seems that the implementation now indeed exhibits the more intuitive behavior.

like image 113
Holger Avatar answered Oct 20 '22 17:10

Holger