The JavaDocs for <code>findFirst</code> say that if the stream has an encounter order, then the first element will always be returned, but if the stream does not have an encounter order, any element may be returned. I'm trying to demonstrate how this works on a stream without an encounter order, but I can't get it to return anything other than the actual first element. I tried adding the elements to a <code>Set</code>, which does not have a defined encounter order: <pre class="prettyprint"><code> Set<String> words = new HashSet<>(); words.addAll(Arrays.asList("this", "is", "a", "stream", "of", "strings")); Optional<String> firstString = words.stream() .findFirst(); System.out.println(firstString); </code></pre> Every time I run, I get <code>a</code> as the first string. Then I tried doing a <code>Collections.shuffle</code> on the <code>List</code> before adding it to the <code>Set</code>, but that didn't change anything. <pre class="prettyprint"><code> List<String> wordList = Arrays.asList("this", "is", "a", "stream", "of", "strings"); words = new HashSet<>(); words.addAll(wordList); firstString = words.stream() .findFirst(); System.out.println(firstString); </code></pre> I still get back the word <code>a</code> every time. Then I tried using the <code>unordered</code> method from <code>BaseStream</code>, which claims to return a stream without encounter order, but no difference: <pre class="prettyprint"><code> firstString = Stream.of("this", "is", "a", "stream", "of", "strings") .unordered() .findFirst(); System.out.println(firstString); </code></pre> Now I get the word <code>this</code> every time. Am I missing something? Is there some way to demonstrate that <code>findFirst</code> on an unordered stream returns different values?

Well, “any” includes the possibility of “first”. Of course, the Stream implementation does not waste efforts in randomizing the data, so for a lot of cases, especially with sequential execution, it will still be the first element, if we can call it that way (as without an order, there is no distinguished first element). Your best chances of exhibiting different results for <code>findFirst</code> are with parallel Streams. But even there, not every combination of operations is suitable of exhibiting the unorderedness. One point is, that in the current implementation, the <code>findFirst()</code> operation does not change it’s behavior when the Stream is unordered, i.e. it does not actively try to be like <code>findAny()</code>. It may still exhibit unpredictable behavior due to the source of the Stream, but if your source is <code>Stream.of("this", "is", "a", "stream", "of", "strings")</code>, i.e. an immutable sequence of a known size, it has already the best parallel performance possible, so there’s simply no way of drawing a benefit of the chained <code>unordered()</code>, hence, the current implementation doesn’t change its behavior. It might surprise, but this even applies to a <code>HashSet</code> to some degree. While it has an unspecified order, there will be an actual order within its backing array at some point of time and as long as you don’t modify the <code>Set</code>, there will be no reason to shuffle these entries around, so for a particular <code>HashSet</code> instance, you may repeatedly get the same “first” element, though it’s not specified which one and even within a single runtime, another <code>HashSet</code> instance representing the same contents, but having a different history, may have a different order. <hr> One example of an operation that is known to draw a benefit from the unordered characteristics is <code>distinct</code>. While it has to sort out duplicates, it has to keep the first encountered of equal elements, if it makes a notable difference. This can degrade the performance significantly, hence, the implementation will immediately try to get a benefit if the stream is unordered. E.g. <pre class="prettyprint"><code>List<String> equal=IntStream.range(0, 100) .mapToObj(i->new String("test")) // don't do this in normal code .collect(Collectors.toList()); Map<String, Integer> map = IntStream.range(0, equal.size()) .collect(IdentityHashMap::new, (m,i)->m.put(equal.get(i),i), Map::putAll); equal.parallelStream().distinct().map(map::get) .findFirst().ifPresent(System.out::println); </code></pre> This creates a bunch of <code>equal</code> but distinguishable <code>String</code> instances (which you normally shouldn’t do), registers them with their positional number in an <code>IdentityHashMap</code>, so we can find out, which instance <code>distinct</code> has retained. Since the code above uses an ordered stream created by a <code>List</code>, it consistently prints <code>0</code>, regardless of how often you execute it. In contrast, <pre class="prettyprint"><code>equal.parallelStream().unordered().distinct().map(map::get) .findFirst().ifPresent(System.out::println); </code></pre> will print arbitrary numbers of the range, as we have released the ordered contract and allow to pick any of the equal strings. <hr> As already noted before, this is all implementation specific. You should never make an assumption about whether an operation can actually draw a benefit and thus, will change its behavior for unordered streams. The explanation above was only meant to illustrate why sometimes the behavior of a particular implementation might not change for unordered stream. Though, it still might in the next version or a different JRE implementation.

Holger has already ably explained the situation. (+1) I'd like to provide a demonstration of <code>HashSet</code> instances that have the same contents but that have different iteration order. First we create a set as before: <pre class="prettyprint"><code> List<String> wordList = Arrays.asList("this", "is", "a", "stream", "of", "strings"); Set<String> words = new HashSet<>(wordList); </code></pre> We create another set of words, add a bunch of stuff (doesn't matter exactly what it is), and then remove it: <pre class="prettyprint"><code> Set<String> words2 = new HashSet<>(wordList); IntStream.range(0, 50).forEachOrdered(i -> words2.add(String.valueOf(i))); words2.retainAll(wordList); </code></pre> If we inspect the results as follows: <pre class="prettyprint"><code> System.out.println(words.equals(words2)); System.out.println(words); System.out.println(words2); </code></pre> we can see from the output that the sets are equal but iterate in a different order: <pre class="prettyprint"><code>true [a, strings, stream, of, this, is] [this, is, strings, stream, of, a] </code></pre> As noted elsewhere, if you get a stream from these and call <code>findFirst()</code>, the result is the first element in iteration order, which will clearly differ between these sets. What happened is that by adding and removing a bunch of elements, we've caused the set to increase its internal table size, requiring the elements to be rehashed. The original elements end up in different relative positions in the new table, even after the new elements have been removed. Although <code>HashSets</code> have no specified iteration order, the order is likely to be repeatable (and even predictable) if the set is initialized with the same contents in the same way every time. We thus say that the stream from a set has no defined encounter order, even though the order is often the same each time. Note that in JDK 9, the new immutable sets (and maps) are actually randomized, so their iteration orders will change from run to run, even if they are initialized the same way every time.

Java 8 findFirst and encounter order

Tags:

The JavaDocs for findFirst say that if the stream has an encounter order, then the first element will always be returned, but if the stream does not have an encounter order, any element may be returned.

I'm trying to demonstrate how this works on a stream without an encounter order, but I can't get it to return anything other than the actual first element.

I tried adding the elements to a Set, which does not have a defined encounter order:

    Set<String> words = new HashSet<>();     words.addAll(Arrays.asList("this", "is", "a", "stream", "of", "strings"));     Optional<String> firstString = words.stream()             .findFirst();     System.out.println(firstString);

Every time I run, I get a as the first string. Then I tried doing a Collections.shuffle on the List before adding it to the Set, but that didn't change anything.

    List<String> wordList = Arrays.asList("this", "is", "a", "stream", "of", "strings");     words = new HashSet<>();     words.addAll(wordList);     firstString = words.stream()             .findFirst();     System.out.println(firstString);

I still get back the word a every time.

Then I tried using the unordered method from BaseStream, which claims to return a stream without encounter order, but no difference:

    firstString = Stream.of("this", "is", "a", "stream", "of", "strings")             .unordered()             .findFirst();     System.out.println(firstString);

Now I get the word this every time. Am I missing something? Is there some way to demonstrate that findFirst on an unordered stream returns different values?

944

asked Jan 27 '17 12:01

kousen

2 Answers

Well, “any” includes the possibility of “first”. Of course, the Stream implementation does not waste efforts in randomizing the data, so for a lot of cases, especially with sequential execution, it will still be the first element, if we can call it that way (as without an order, there is no distinguished first element).

Your best chances of exhibiting different results for findFirst are with parallel Streams. But even there, not every combination of operations is suitable of exhibiting the unorderedness.

One point is, that in the current implementation, the findFirst() operation does not change it’s behavior when the Stream is unordered, i.e. it does not actively try to be like findAny(). It may still exhibit unpredictable behavior due to the source of the Stream, but if your source is Stream.of("this", "is", "a", "stream", "of", "strings"), i.e. an immutable sequence of a known size, it has already the best parallel performance possible, so there’s simply no way of drawing a benefit of the chained unordered(), hence, the current implementation doesn’t change its behavior.

It might surprise, but this even applies to a HashSet to some degree. While it has an unspecified order, there will be an actual order within its backing array at some point of time and as long as you don’t modify the Set, there will be no reason to shuffle these entries around, so for a particular HashSet instance, you may repeatedly get the same “first” element, though it’s not specified which one and even within a single runtime, another HashSet instance representing the same contents, but having a different history, may have a different order.

One example of an operation that is known to draw a benefit from the unordered characteristics is distinct. While it has to sort out duplicates, it has to keep the first encountered of equal elements, if it makes a notable difference. This can degrade the performance significantly, hence, the implementation will immediately try to get a benefit if the stream is unordered. E.g.

List<String> equal=IntStream.range(0, 100)     .mapToObj(i->new String("test")) // don't do this in normal code     .collect(Collectors.toList()); Map<String, Integer> map = IntStream.range(0, equal.size())     .collect(IdentityHashMap::new, (m,i)->m.put(equal.get(i),i), Map::putAll);  equal.parallelStream().distinct().map(map::get)      .findFirst().ifPresent(System.out::println);

This creates a bunch of equal but distinguishable String instances (which you normally shouldn’t do), registers them with their positional number in an IdentityHashMap, so we can find out, which instance distinct has retained. Since the code above uses an ordered stream created by a List, it consistently prints 0, regardless of how often you execute it.

In contrast,

equal.parallelStream().unordered().distinct().map(map::get)      .findFirst().ifPresent(System.out::println);

will print arbitrary numbers of the range, as we have released the ordered contract and allow to pick any of the equal strings.

As already noted before, this is all implementation specific. You should never make an assumption about whether an operation can actually draw a benefit and thus, will change its behavior for unordered streams. The explanation above was only meant to illustrate why sometimes the behavior of a particular implementation might not change for unordered stream. Though, it still might in the next version or a different JRE implementation.

135

answered Oct 05 '22 08:10

Holger

Holger has already ably explained the situation. (+1) I'd like to provide a demonstration of HashSet instances that have the same contents but that have different iteration order. First we create a set as before:

    List<String> wordList = Arrays.asList("this", "is", "a", "stream", "of", "strings");     Set<String> words = new HashSet<>(wordList);

We create another set of words, add a bunch of stuff (doesn't matter exactly what it is), and then remove it:

    Set<String> words2 = new HashSet<>(wordList);     IntStream.range(0, 50).forEachOrdered(i -> words2.add(String.valueOf(i)));     words2.retainAll(wordList);

If we inspect the results as follows:

    System.out.println(words.equals(words2));     System.out.println(words);     System.out.println(words2);

we can see from the output that the sets are equal but iterate in a different order:

true [a, strings, stream, of, this, is] [this, is, strings, stream, of, a]

As noted elsewhere, if you get a stream from these and call findFirst(), the result is the first element in iteration order, which will clearly differ between these sets.

What happened is that by adding and removing a bunch of elements, we've caused the set to increase its internal table size, requiring the elements to be rehashed. The original elements end up in different relative positions in the new table, even after the new elements have been removed.

Although HashSets have no specified iteration order, the order is likely to be repeatable (and even predictable) if the set is initialized with the same contents in the same way every time. We thus say that the stream from a set has no defined encounter order, even though the order is often the same each time.

Note that in JDK 9, the new immutable sets (and maps) are actually randomized, so their iteration orders will change from run to run, even if they are initialized the same way every time.

answered Oct 05 '22 07:10

Stuart Marks

Related questions
                            
                                defaultdict : first argument must be callable or None
                            
                                Non-interactive SQLite3 usage from bash script
                            
                                Catching SIGTERM vs catching SIGINT
                            
                                Unpivot in spark-sql/pyspark
                            
                                Angular CLI ERROR in Cannot read property 'loadChildren' of null
                            
                                Ngrx: combine two selectors
                            
                                Pandas count null values in a groupby function
                            
                                Command line connection string for EF core database update
                            
                                Redux form defaultValue
                            
                                google mock - can I call EXPECT_CALL multiple times on same mock object?
                            
                                Launching Explorer from WSL
                            
                                Why does abstract class have to implement all methods from interface?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With