Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java Streams | groupingBy same elements

I have a stream of words and I would like to sort them according to the occurrence of same elements (=words).

e.g.: {hello, world, hello}

to

Map<String, List<String>>

hello, {hello, hello}

world, {world}

What i have so far:

Map<Object, List<String>> list = streamofWords.collect(Collectors.groupingBy(???));

Problem 1: The stream seems to lose the information that he is processing Strings, therefore the compiler forces me to change the type to Object, List

Problem 2: I don't know what to put inside the parentesis to group it by the same occurrence. I know that I am able to process single elements within th lambda-expression but I have no idea how to reach "outside" each element to check for equality.

Thank You

like image 391
SklogW Avatar asked Apr 29 '15 14:04

SklogW


2 Answers

To get a Map<String, List<String>>, you just need to tell to the groupingBy collector that you want to group the values by identity, so the function x -> x.

Map<String, List<String>> occurrences = 
     streamOfWords.collect(groupingBy(str -> str));

However this a bit useless, as you see you have the same type of informations two times. You should look into a Map<String, Long>, where's the value indicates the occurrences of the String in the Stream.

Map<String, Long> occurrences = 
     streamOfWords.collect(groupingBy(str -> str, counting()));

Basically instead of having a groupingBy that return values as List, you use the downstream collector counting() to tell that you want to count the number of times this value appears.

Your sort requirement should imply that you should have a Map<Long, List<String>> (what if different Strings appear the same number of times?), and as the default toMap collector returns an HashMap, it has no notions of ordering, but you could store the elements in a TreeMap instead.


I've tried to summarize a bit what I've said in the comments.

You seems to have troubles with how str -> str can tell whether "hello" or "world" are different.

First of all str -> str is a function, that is, for an input x yields a value f(x). For example, f(x) = x + 2 is a function that for any value x returns x + 2.

Here we are using the identity function, that is f(x) = x. When you collect the elements from the pipeline in the Map, this function will be called before to obtain the key from the value. So in your example, you have 3 elements for which the identity function yields:

f("hello") = "hello"
f("world") = "world"

So far so good.

Now when collect() is called, for every value in the stream you'll apply the function on it and evaluate the result (which will be the key in the Map). If a key already exists, we take the currently mapped value and we merge in a List the value we wanted to put (i.e the value from which you just applied the function on) with this previous mapped value. That's why you get a Map<String, List<String>> at the end.

Let's take another example. Now the stream contains the values "hello", "world" and "hey" and the function that we want to apply to group the elements is str -> str.substring(0, 2), that is, the function that takes the first two characters of the String.

Similarly, we have:

f("hello") = "he"
f("world") = "wo"
f("hey") = "he"

Here you see that both "hello" and "hey" yields the same key when applying the function and hence they will be grouped in the same List when collecting them, so that the final result is:

"he" -> ["hello", "hey"]
"wo" -> ["world"]

To have an analogy with mathematics, you could have take any non-bijective function, such as x2. For x = -2 and x = 2 we have that f(x) = 4. So if we grouped integers by this function, -2 and 2 would have been in the same "bag".

Looking at the source code won't help you to understand what's going on at first. It's useful if you want to know how it's implemented under the hood. But try first to think of the concept with a higher level of abstraction and then maybe things will become clearer.

Hope it helps! :)

like image 170
Alexis C. Avatar answered Nov 13 '22 08:11

Alexis C.


The KeyExtractor you are searching for is the identity function:

Map<String, List<String>> list = streamofWords.collect(Collectors.groupingBy(Function.identity()));

EDIT added explanation:

  • Function.identity() retuns a 'Function' with one method that does nothing more than returning the argument it gets.
  • Collectors.groupingBy(Function<S, K> keyExtractor) provides a collector, which collects all elements of the stream to a Map<K, List<S>>. It is using the keyExtractor implementation it gets to inspect the stream's objects of type S and deduce a key of type K from them. This key is the map's key used to get (or create) the list in the result map the stream element is added to.
like image 40
flo Avatar answered Nov 13 '22 08:11

flo