Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java 8 Streams: Map the same object multiple times based on different properties

I was presented with an interesting problem by a colleague of mine and I was unable to find a neat and pretty Java 8 solution. The problem is to stream through a list of POJOs and then collect them in a map based on multiple properties - the mapping causes the POJO to occur multiple times

Imagine the following POJO:

private static class Customer {     public String first;     public String last;      public Customer(String first, String last) {         this.first = first;         this.last = last;     }      public String toString() {         return "Customer(" + first + " " + last + ")";     } } 

Set it up as a List<Customer>:

// The list of customers List<Customer> customers = Arrays.asList(         new Customer("Johnny", "Puma"),         new Customer("Super", "Mac")); 

Alternative 1: Use a Map outside of the "stream" (or rather outside forEach).

// Alt 1: not pretty since the resulting map is "outside" of // the stream. If parallel streams are used it must be // ConcurrentHashMap Map<String, Customer> res1 = new HashMap<>(); customers.stream().forEach(c -> {     res1.put(c.first, c);     res1.put(c.last, c); }); 

Alternative 2: Create map entries and stream them, then flatMap them. IMO it is a bit too verbose and not so easy to read.

// Alt 2: A bit verbose and "new AbstractMap.SimpleEntry" feels as // a "hard" dependency to AbstractMap Map<String, Customer> res2 =         customers.stream()                 .map(p -> {                     Map.Entry<String, Customer> firstEntry = new AbstractMap.SimpleEntry<>(p.first, p);                     Map.Entry<String, Customer> lastEntry = new AbstractMap.SimpleEntry<>(p.last, p);                     return Stream.of(firstEntry, lastEntry);                 })                 .flatMap(Function.identity())                 .collect(Collectors.toMap(                         Map.Entry::getKey, Map.Entry::getValue)); 

Alternative 3: This is another one that I came up with the "prettiest" code so far but it uses the three-arg version of reduce and the third parameter is a bit dodgy as found in this question: Purpose of third argument to 'reduce' function in Java 8 functional programming. Furthermore, reduce does not seem like a good fit for this problem since it is mutating and parallel streams may not work with the approach below.

// Alt 3: using reduce. Not so pretty Map<String, Customer> res3 = customers.stream().reduce(         new HashMap<>(),         (m, p) -> {             m.put(p.first, p);             m.put(p.last, p);             return m;         }, (m1, m2) -> m2 /* <- NOT USED UNLESS PARALLEL */); 

If the above code is printed like this:

System.out.println(res1); System.out.println(res2); System.out.println(res3); 

The result would be:

{Super=Customer(Super Mac), Johnny=Customer(Johnny Puma), Mac=Customer(Super Mac), Puma=Customer(Johnny Puma)}
{Super=Customer(Super Mac), Johnny=Customer(Johnny Puma), Mac=Customer(Super Mac), Puma=Customer(Johnny Puma)}
{Super=Customer(Super Mac), Johnny=Customer(Johnny Puma), Mac=Customer(Super Mac), Puma=Customer(Johnny Puma)}

So, now to my question: How should I, in a Java 8 orderly fashion, stream through the List<Customer> and then somehow collect it as a Map<String, Customer> where you split the whole thing as two keys (first AND last) i.e. the Customer is mapped twice. I do not want to use any 3rd party libraries, I do not want to use a map outside of the stream as in alt 1. Are there any other nice alternatives?

The full code can be found on hastebin for simple copy-paste to get the whole thing running.

like image 995
wassgren Avatar asked Feb 13 '15 20:02

wassgren


People also ask

Can a stream be used multiple times?

From the documentation: A stream should be operated on (invoking an intermediate or terminal stream operation) only once. A stream implementation may throw IllegalStateException if it detects that the stream is being reused. So the answer is no, streams are not meant to be reused.

Are Java 8 streams lazy?

The Java 8 Streams API is fully based on the 'process only on demand' strategy and hence supports laziness. In the Java 8 Streams API, the intermediate operations are lazy and their internal processing model is optimised to make it being capable of processing the large amount of data with high performance.

Are Java streams multi threaded?

Java 8 introduced the concept of Streams as an efficient way of carrying out bulk operations on data. And parallel Streams can be obtained in environments that support concurrency. These streams can come with improved performance – at the cost of multi-threading overhead.

What is the purpose of the map method in Java 8 streams?

Java 8 Stream's map method is intermediate operation and consumes single element forom input Stream and produces single element to output Stream. It simply used to convert Stream of one type to another.


1 Answers

I think your alternatives 2 and 3 can be re-written to be more clear:

Alternative 2:

Map<String, Customer> res2 = customers.stream()     .flatMap(         c -> Stream.of(c.first, c.last)         .map(k -> new AbstractMap.SimpleImmutableEntry<>(k, c))     ).collect(toMap(Map.Entry::getKey, Map.Entry::getValue)); 

Alternative 3: Your code abuses reduce by mutating the HashMap. To do mutable reduction, use collect:

Map<String, Customer> res3 = customers.stream()     .collect(         HashMap::new,          (m,c) -> {m.put(c.first, c); m.put(c.last, c);},          HashMap::putAll     ); 

Note that these are not identical. Alternative 2 will throw an exception if there are duplicate keys while Alternative 3 will silently overwrite the entries.

If overwriting entries in case of duplicate keys is what you want, I would personally prefer Alternative 3. It is immediately clear to me what it does. It most closely resembles the iterative solution. I would expect it to be more performant as Alternative 2 has to do a bunch of allocations per customer with all that flatmapping.

However, Alternative 2 has a huge advantage over Alternative 3 by separating the production of entries from their aggregation. This gives you a great deal of flexibility. For example, if you want to change Alternative 2 to overwrite entries on duplicate keys instead of throwing an exception, you would simply add (a,b) -> b to toMap(...). If you decide you want to collect matching entries into a list, all you would have to do is replace toMap(...) with groupingBy(...), etc.

like image 148
Misha Avatar answered Sep 21 '22 22:09

Misha