Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Should I stream multiple times or do all calculations in one stream?

I have the following code:

mostRecentMessageSentDate = messageInfoList
    .stream()
    .findFirst().orElse(new MessageInfo())
    .getSentDate();

unprocessedMessagesCount = messageInfoList
    .stream()
    .filter(messageInfo -> messageInfo.getProcessedDate() == null)
    .count();

hasAttachment = messageInfoList
    .stream()
    .anyMatch(messageInfo -> messageInfo.getAttachmentCount() > 0);

As you can see, I stream the same list 3 times, because I want to find 3 different values. If I did this in a For-Each loop, I could loop just once.

Is it better, performance wise to do this a for loop then, so that I loop only once? I find the streams much more readable.

Edit: I ran some tests:

import java.util.List;
import java.util.stream.Collectors;
import java.util.stream.IntStream;

public class Main {

public static void main(String[] args) {

    List<Integer> integerList = populateList();

    System.out.println("Stream time: " + timeStream(integerList));
    System.out.println("Loop time: " + timeLoop(integerList));

}

private static List<Integer> populateList() {
    return IntStream.range(0, 10000000)
            .boxed()
            .collect(Collectors.toList());
}

private static long timeStream(List<Integer> integerList) {
    long start = System.currentTimeMillis();

    Integer first = integerList
            .stream()
            .findFirst().orElse(0);

    long containsNumbersGreaterThan10000 = integerList
            .stream()
            .filter(i -> i > 10000)
            .count();

    boolean has10000 = integerList
            .stream()
            .anyMatch(i -> i == 10000);

    long end = System.currentTimeMillis();

    System.out.println("first: " + first);
    System.out.println("containsNumbersGreaterThan10000: " + containsNumbersGreaterThan10000);
    System.out.println("has10000: " + has10000);

    return end - start;
}

private static long timeLoop(List<Integer> integerList) {
    long start = System.currentTimeMillis();

    Integer first = 0;
    boolean has10000 = false;
    int count = 0;
    long containsNumbersGreaterThan10000 = 0L;
    for (Integer i : integerList) {
        if (count == 0) {
            first = i;
        }

        if (i > 10000) {
            containsNumbersGreaterThan10000++;
        }

        if (!has10000 && i == 10000) {
            has10000 = true;
        }

        count++;
    }

    long end = System.currentTimeMillis();

    System.out.println("first: " + first);
    System.out.println("containsNumbersGreaterThan10000: " + containsNumbersGreaterThan10000);
    System.out.println("has10000: " + has10000);

    return end - start;
}
}

and as expected, the for loop is always faster than the streams

first: 0
containsNumbersGreaterThan10000: 9989999
has10000: true
Stream time: 57
first: 0
containsNumbersGreaterThan10000: 9989999
has10000: true
Loop time: 38

But never significantly.

The findFirst was probably a bad example, because it just quits if the stream is empty, but I wanted to know if it made a difference.

I was hoping to get a solution that allowed multiple calculations from one stream. IntSummaryStatistics dont do exactly what I want. I think I'll heed @florian-schaetz and stick to favouring readbility for a marginal performance increase

like image 722
Somaiah Kumbera Avatar asked May 29 '17 08:05

Somaiah Kumbera


People also ask

Does parallel stream improve performance?

The Stream API makes it possible to execute a sequential stream in parallel without rewriting the code. The primary reason for using parallel streams is to improve performance while at the same time ensuring that the results obtained are the same, or at least compatible, regardless of the mode of execution.

Is stream more efficient than for loop?

If you have a small list, loops perform better. If you have a huge list, a parallel stream will perform better. Purely thinking in terms of performance, you shouldn't use a for-each loop with an ArrayList, as it creates an extra Iterator instance that you don't need (for LinkedList it's a different matter).

Can a stream be used multiple times?

A stream should be operated on (invoking an intermediate or terminal stream operation) only once. A stream implementation may throw IllegalStateException if it detects that the stream is being reused. So the answer is no, streams are not meant to be reused.

Should I use stream or for loop?

The short version basically is, if you have a small list; for loops perform better, if you have a huge list; a parallel stream will perform better. And since parallel streams have quite a bit of overhead, it is not advised to use these unless you are sure it is worth the overhead.


1 Answers

You don't iterate through the collection three times.

mostRecentMessageSentDate = messageInfoList
        .stream()
        .findFirst().orElse(new MessageInfo())
        .getSentDate();

The above checks if there are any elements in the collection and returns a value depending on this. It doesn't need to go through the whole collection.

unprocessedMessagesCount = messageInfoList
        .stream()
        .filter(messageInfo -> messageInfo.getProcessedDate() == null)
        .count();

This one needs to filter out all elements without a process date and counts them, so this one goes through the whole collection.

hasAttachment = messageInfoList
        .stream()
        .anyMatch(messageInfo -> messageInfo.getAttachmentCount() > 0);

The above only needs to go through the elements until it finds a message with an attachment.

So, of the three streams, only one of them is required to go through the whole collection, in the worst case you do the iteration two times (the second, and potentionally the third stream).

This could probably be done more efficient with a regular For-Each loop, but do you really need it? If your collection only contains a few objects, I wouldn't bother optimizing it.

However, with a traditional For-Each loop, you could combine the last two streams:

int unprocessedMessagesCount = 0;
boolean hasAttachment = false;

for (MessageInfo messageInfo: messageInfoList) {
  if (messageInfo.getProcessedDate() == null) {
    unprocessedMessagesCount++;
  }
  if (hasAttachment == false && messageInfo.getAttachmentCount() > 0) {
    hasAttachment = true;
  }
}

It is really up to you if you think this is a better solution (I also find the streams more readable). I don't see a way to combine the three streams into one, at least not in a more readable way.

like image 182
Magnilex Avatar answered Oct 21 '22 06:10

Magnilex