I just read about Branch-Prediction and wanted to try how this works with Java 8 Streams.
However the performance with Streams is always turning out to be worse than traditional loops.
int totalSize = 32768; int filterValue = 1280; int[] array = new int[totalSize]; Random rnd = new Random(0); int loopCount = 10000; for (int i = 0; i < totalSize; i++) { // array[i] = rnd.nextInt() % 2560; // Unsorted Data array[i] = i; // Sorted Data } long start = System.nanoTime(); long sum = 0; for (int j = 0; j < loopCount; j++) { for (int c = 0; c < totalSize; ++c) { sum += array[c] >= filterValue ? array[c] : 0; } } long total = System.nanoTime() - start; System.out.printf("Conditional Operator Time : %d ns, (%f sec) %n", total, total / Math.pow(10, 9)); start = System.nanoTime(); sum = 0; for (int j = 0; j < loopCount; j++) { for (int c = 0; c < totalSize; ++c) { if (array[c] >= filterValue) { sum += array[c]; } } } total = System.nanoTime() - start; System.out.printf("Branch Statement Time : %d ns, (%f sec) %n", total, total / Math.pow(10, 9)); start = System.nanoTime(); sum = 0; for (int j = 0; j < loopCount; j++) { sum += Arrays.stream(array).filter(value -> value >= filterValue).sum(); } total = System.nanoTime() - start; System.out.printf("Streams Time : %d ns, (%f sec) %n", total, total / Math.pow(10, 9)); start = System.nanoTime(); sum = 0; for (int j = 0; j < loopCount; j++) { sum += Arrays.stream(array).parallel().filter(value -> value >= filterValue).sum(); } total = System.nanoTime() - start; System.out.printf("Parallel Streams Time : %d ns, (%f sec) %n", total, total / Math.pow(10, 9));
Output :
For Sorted-Array :
Conditional Operator Time : 294062652 ns, (0.294063 sec) Branch Statement Time : 272992442 ns, (0.272992 sec) Streams Time : 806579913 ns, (0.806580 sec) Parallel Streams Time : 2316150852 ns, (2.316151 sec)
For Un-Sorted Array:
Conditional Operator Time : 367304250 ns, (0.367304 sec) Branch Statement Time : 906073542 ns, (0.906074 sec) Streams Time : 1268648265 ns, (1.268648 sec) Parallel Streams Time : 2420482313 ns, (2.420482 sec)
I tried the same code using List:list.stream()
instead of Arrays.stream(array)
list.get(c)
instead of array[c]
Output :
For Sorted-List :
Conditional Operator Time : 860514446 ns, (0.860514 sec) Branch Statement Time : 663458668 ns, (0.663459 sec) Streams Time : 2085657481 ns, (2.085657 sec) Parallel Streams Time : 5026680680 ns, (5.026681 sec)
For Un-Sorted List
Conditional Operator Time : 704120976 ns, (0.704121 sec) Branch Statement Time : 1327838248 ns, (1.327838 sec) Streams Time : 1857880764 ns, (1.857881 sec) Parallel Streams Time : 2504468688 ns, (2.504469 sec)
I referred to few blogs this & this which suggest the same performance issue w.r.t streams.
Advantages of the streams:Your stream-handling code doesn't need to know the source of the stream or its eventual terminating method. Streams can succinctly express quite sophisticated behavior. Streams can be a replacement for looping because they allow for the processing of a sequence of data (similarly to a loop).
There are a lot of benefits to using streams in Java, such as the ability to write functions at a more abstract level which can reduce code bugs, compact functions into fewer and more readable lines of code, and the ease they offer for parallelization.
Performance. There are many opinions about which style performs better. The short version basically is, if you have a small list; for loops perform better, if you have a huge list; a parallel stream will perform better.
Again, the for- loop is faster that the sequential stream operation, but the difference on an ArrayList is not nearly as significant as it was on an array.
I agree to the point that programming with streams is nice and easier for some scenarios but when we're losing out on performance, why do we need to use them?
Performance is rarely an issue. It would be usual for 10% of your streams would need to be rewritten as loops to get the performance you need.
Is there something I'm missing out on?
Using parallelStream() is much easier using streams and possibly more efficient as it's hard to write efficient concurrent code.
Which is the scenario in which streams perform equal to loops? Is it only in the case where your function defined takes a lot of time, resulting in a negligible loop performance?
Your benchmark is flawed in the sense that the code hasn't been compiled when it starts. I would do the whole test in a loop as JMH does, or I would use JMH.
In none of the scenario's I could see streams taking advantage of branch-prediction
Branch prediction is a CPU feature not a JVM or streams feature.
Java is a high level language saving the programmer from considering low level performance optimization.
Never choose a certain approach for performance reasons unless you have proven that this is a problem in your real application.
Your measurements show some negative effect for streams, but the difference is below observability. Therefore, it's not a Problem. Also, this Test is a "synthetic" situation and the code may behave completely different in a heavy duty production environment. Furthermore, the machine code created from your Java (byte) code by the JIT may change in future Java (maintenance) releases and make your measurements obsolete.
In conclusion: Choose the syntax or approach that most expresses your (the programmer's) intention. Keep to that same approach or syntax throughout the program unless you have a good reason to change.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With