Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using java streams to put the last encountered value into a map

I have some code as follows:

Map<RiskFactor, RiskFactorChannelData> updateMap =
    updates.stream().filter(this::updatedValueIsNotNull). // Remove null updated values
        collect(Collectors.toMap(
            u -> u.getUpdatedValue().getKey(), // then merge into a map of key->value.
            Update::getUpdatedValue,
            (a, b) -> b)); // If two values have the same key then take the second value

Specifically I want to take the values from the list and put them into the map. That all works perfectly. My concern though is with ordering.

For example if the list has:

a1, b1, a2

How do I ensure that the final map contains:

a->a2
b->b1

Instead of

a->a1
b->b1

The incoming list is ordered, stream().filter() should have maintained the order but I can't see anything in the documentation of Collectors.toMap about ordering of the inputs.

Is this safe in the general case or have I just been lucky on my test cases so far? Am I going to be JVM dependent and at risk of this changing in the future?

This is very simple to guarantee if I just write a for loop but the "fuzzyness" of potential stream behavior is making me concerned.

I'm not planning to use parallel for this, I'm purely seeking to understand the behavior in the case of a sequential non-parallel stream that reaches to toMap.

like image 882
Tim B Avatar asked Feb 08 '17 15:02

Tim B


2 Answers

The term “most recent value” is a bit misleading. Since you want the last value according to encounter order, the answer is that toMap will respect the encounter order.

Its documentation refers to Map.merge to explain the semantics of the merge function, but unfortunately, that documentation is a bit thin too. It doesn’t mention the fact that this function is invoked with (oldValue,newValue) explicitly; it can only be deduced from the code example.

toMap’s documentation further states:

The returned Collector is not concurrent. For parallel stream pipelines, the combiner function operates by merging the keys from one map into another, which can be an expensive operation. If it is not required that results are merged into the Map in encounter order, using toConcurrentMap(Function, Function, BinaryOperator, Supplier) may offer better parallel performance.

So it explicitly directs to a different collector, if encounter order is not required. Generally, all builtin collectors provided by Collectors are only unordered, if explicitly stated, which is only the case for the “…Concurrent…” collectors and the toSet() collector.

like image 177
Holger Avatar answered Oct 21 '22 20:10

Holger


It is safe, Collection.stream() creates a sequential stream.

I suggest to take a look at Collectors.toMap in case of collisions it takes care to choose the correct value. In your case you should use the more recent.

The part you're interested in is (a, b) -> b where you arbitrarily choose the second element, there you should choose the more recent.

I think your problems came from the fact that are not sure about the processing order, in case you want continue to use streams (instead of a for loop) you could enforce this state adding .sequential() after .stream().

Another way, I would prefer, is add a timestamp to the RiskFactorChannelData, and use even a parallel stream.

like image 32
freedev Avatar answered Oct 21 '22 20:10

freedev