The JavaDocs for findFirst
say that if the stream has an encounter order, then the first element will always be returned, but if the stream does not have an encounter order, any element may be returned.
I'm trying to demonstrate how this works on a stream without an encounter order, but I can't get it to return anything other than the actual first element.
I tried adding the elements to a Set
, which does not have a defined encounter order:
Set<String> words = new HashSet<>(); words.addAll(Arrays.asList("this", "is", "a", "stream", "of", "strings")); Optional<String> firstString = words.stream() .findFirst(); System.out.println(firstString);
Every time I run, I get a
as the first string. Then I tried doing a Collections.shuffle
on the List
before adding it to the Set
, but that didn't change anything.
List<String> wordList = Arrays.asList("this", "is", "a", "stream", "of", "strings"); words = new HashSet<>(); words.addAll(wordList); firstString = words.stream() .findFirst(); System.out.println(firstString);
I still get back the word a
every time.
Then I tried using the unordered
method from BaseStream
, which claims to return a stream without encounter order, but no difference:
firstString = Stream.of("this", "is", "a", "stream", "of", "strings") .unordered() .findFirst(); System.out.println(firstString);
Now I get the word this
every time. Am I missing something? Is there some way to demonstrate that findFirst
on an unordered stream returns different values?
When Stream is unordered, findFirst() and findAny() are the same. But when Stream is ordered, findAny() will be better.
In Java 8 Stream, the findFirst() returns the first element from a Stream, while findAny() returns any element from a Stream.
The findFirst() method finds the first element in a Stream. So, we use this method when we specifically want the first element from a sequence. When there is no encounter order, it returns any element from the Stream. According to the java.
Stream findFirst() in Java with examples Stream findFirst() returns an Optional (a container object which may or may not contain a non-null value) describing the first element of this stream, or an empty Optional if the stream is empty. If the stream has no encounter order, then any element may be returned.
Well, “any” includes the possibility of “first”. Of course, the Stream implementation does not waste efforts in randomizing the data, so for a lot of cases, especially with sequential execution, it will still be the first element, if we can call it that way (as without an order, there is no distinguished first element).
Your best chances of exhibiting different results for findFirst
are with parallel Streams. But even there, not every combination of operations is suitable of exhibiting the unorderedness.
One point is, that in the current implementation, the findFirst()
operation does not change it’s behavior when the Stream is unordered, i.e. it does not actively try to be like findAny()
. It may still exhibit unpredictable behavior due to the source of the Stream, but if your source is Stream.of("this", "is", "a", "stream", "of", "strings")
, i.e. an immutable sequence of a known size, it has already the best parallel performance possible, so there’s simply no way of drawing a benefit of the chained unordered()
, hence, the current implementation doesn’t change its behavior.
It might surprise, but this even applies to a HashSet
to some degree. While it has an unspecified order, there will be an actual order within its backing array at some point of time and as long as you don’t modify the Set
, there will be no reason to shuffle these entries around, so for a particular HashSet
instance, you may repeatedly get the same “first” element, though it’s not specified which one and even within a single runtime, another HashSet
instance representing the same contents, but having a different history, may have a different order.
One example of an operation that is known to draw a benefit from the unordered characteristics is distinct
. While it has to sort out duplicates, it has to keep the first encountered of equal elements, if it makes a notable difference. This can degrade the performance significantly, hence, the implementation will immediately try to get a benefit if the stream is unordered. E.g.
List<String> equal=IntStream.range(0, 100) .mapToObj(i->new String("test")) // don't do this in normal code .collect(Collectors.toList()); Map<String, Integer> map = IntStream.range(0, equal.size()) .collect(IdentityHashMap::new, (m,i)->m.put(equal.get(i),i), Map::putAll); equal.parallelStream().distinct().map(map::get) .findFirst().ifPresent(System.out::println);
This creates a bunch of equal
but distinguishable String
instances (which you normally shouldn’t do), registers them with their positional number in an IdentityHashMap
, so we can find out, which instance distinct
has retained. Since the code above uses an ordered stream created by a List
, it consistently prints 0
, regardless of how often you execute it.
In contrast,
equal.parallelStream().unordered().distinct().map(map::get) .findFirst().ifPresent(System.out::println);
will print arbitrary numbers of the range, as we have released the ordered contract and allow to pick any of the equal strings.
As already noted before, this is all implementation specific. You should never make an assumption about whether an operation can actually draw a benefit and thus, will change its behavior for unordered streams. The explanation above was only meant to illustrate why sometimes the behavior of a particular implementation might not change for unordered stream. Though, it still might in the next version or a different JRE implementation.
Holger has already ably explained the situation. (+1) I'd like to provide a demonstration of HashSet
instances that have the same contents but that have different iteration order. First we create a set as before:
List<String> wordList = Arrays.asList("this", "is", "a", "stream", "of", "strings"); Set<String> words = new HashSet<>(wordList);
We create another set of words, add a bunch of stuff (doesn't matter exactly what it is), and then remove it:
Set<String> words2 = new HashSet<>(wordList); IntStream.range(0, 50).forEachOrdered(i -> words2.add(String.valueOf(i))); words2.retainAll(wordList);
If we inspect the results as follows:
System.out.println(words.equals(words2)); System.out.println(words); System.out.println(words2);
we can see from the output that the sets are equal but iterate in a different order:
true [a, strings, stream, of, this, is] [this, is, strings, stream, of, a]
As noted elsewhere, if you get a stream from these and call findFirst()
, the result is the first element in iteration order, which will clearly differ between these sets.
What happened is that by adding and removing a bunch of elements, we've caused the set to increase its internal table size, requiring the elements to be rehashed. The original elements end up in different relative positions in the new table, even after the new elements have been removed.
Although HashSets
have no specified iteration order, the order is likely to be repeatable (and even predictable) if the set is initialized with the same contents in the same way every time. We thus say that the stream from a set has no defined encounter order, even though the order is often the same each time.
Note that in JDK 9, the new immutable sets (and maps) are actually randomized, so their iteration orders will change from run to run, even if they are initialized the same way every time.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With