I have some code as follows:
Map<RiskFactor, RiskFactorChannelData> updateMap =
updates.stream().filter(this::updatedValueIsNotNull). // Remove null updated values
collect(Collectors.toMap(
u -> u.getUpdatedValue().getKey(), // then merge into a map of key->value.
Update::getUpdatedValue,
(a, b) -> b)); // If two values have the same key then take the second value
Specifically I want to take the values from the list and put them into the map. That all works perfectly. My concern though is with ordering.
For example if the list has:
a1, b1, a2
How do I ensure that the final map contains:
a->a2
b->b1
Instead of
a->a1
b->b1
The incoming list is ordered, stream().filter()
should have maintained the order but I can't see anything in the documentation of Collectors.toMap
about ordering of the inputs.
Is this safe in the general case or have I just been lucky on my test cases so far? Am I going to be JVM dependent and at risk of this changing in the future?
This is very simple to guarantee if I just write a for
loop but the "fuzzyness" of potential stream behavior is making me concerned.
I'm not planning to use parallel for this, I'm purely seeking to understand the behavior in the case of a sequential non-parallel stream that reaches to toMap
.
The term “most recent value” is a bit misleading. Since you want the last value according to encounter order, the answer is that toMap
will respect the encounter order.
Its documentation refers to Map.merge
to explain the semantics of the merge function, but unfortunately, that documentation is a bit thin too. It doesn’t mention the fact that this function is invoked with (oldValue,newValue)
explicitly; it can only be deduced from the code example.
toMap
’s documentation further states:
The returned
Collector
is not concurrent. For parallel stream pipelines, thecombiner
function operates by merging the keys from one map into another, which can be an expensive operation. If it is not required that results are merged into theMap
in encounter order, usingtoConcurrentMap(Function, Function, BinaryOperator, Supplier)
may offer better parallel performance.
So it explicitly directs to a different collector, if encounter order is not required. Generally, all builtin collectors provided by Collectors
are only unordered, if explicitly stated, which is only the case for the “…Concurrent…” collectors and the toSet()
collector.
It is safe, Collection.stream()
creates a sequential stream.
I suggest to take a look at Collectors.toMap
in case of collisions it takes care to choose the correct value. In your case you should use the more recent.
The part you're interested in is (a, b) -> b
where you arbitrarily choose the second element, there you should choose the more recent.
I think your problems came from the fact that are not sure about the processing order, in case you want continue to use streams (instead of a for loop) you could enforce this state adding .sequential()
after .stream()
.
Another way, I would prefer, is add a timestamp to the RiskFactorChannelData
, and use even a parallel stream.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With