Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Stream.collect(groupingBy(identity(), counting()) and then sort the result by value

I can collect a list of words into a bag (a.k.a. multi-set):

Map<String, Long> bag =
        Arrays.asList("one o'clock two o'clock three o'clock rock".split(" "))
        .stream()
        .collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));

However, the entries of the bag are not guaranteed to be in any particular order. For example,

{rock=1, o'clock=3, one=1, three=1, two=1}

I can put them into a list, and then sort them using my implementation of a value comparator:

ArrayList<Entry<String, Long>> list = new ArrayList<>(bag.entrySet());
Comparator<Entry<String, Long>> valueComparator = new Comparator<Entry<String, Long>>() {

    @Override
    public int compare(Entry<String, Long> e1, Entry<String, Long> e2) {
        return e2.getValue().compareTo(e1.getValue());
    }
};
Collections.sort(list, valueComparator);

This gives the desired result:

[o'clock=3, rock=1, one=1, three=1, two=1]

Is there a more elegant way to do this? I'm sure it's a problem many people must have solved. Is there something built into Java Streams API that I can use?

like image 336
whistling_marmot Avatar asked Jan 18 '16 16:01

whistling_marmot


2 Answers

You don't need to create a comparator, there is already one for this task: Map.Entry.comparingByValue. This creates a comparator that compares values of entry of a map. In this case, we are interested in their reverse order so we could have:

Map.Entry.comparingByValue(Comparator.reverseOrder())

as the comparator. Your code could then become

Collections.sort(list, Map.Entry.comparingByValue(Comparator.reverseOrder()));

without having the custom comparator.


To sort the resulting Map regarding its values, you could also use a Stream pipeline. Also, instead of calling Stream.of(Arrays.asList("...").split(" ")), you may want to call Pattern.compile(" ").splitAsStream("...") if you have long Strings to handle.

Map<String, Long> bag =
   Pattern.compile(" ")
          .splitAsStream("one o'clock two o'clock three o'clock rock")
          .collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));
Map<String, Long> sortedBag = 
    bag.entrySet()
       .stream()
       .sorted(Map.Entry.comparingByValue(Comparator.reverseOrder()))
       .collect(Collectors.toMap(
           Map.Entry::getKey,
           Map.Entry::getValue,
           (v1, v2) -> { throw new IllegalStateException(); },
           LinkedHashMap::new
       ));

This code creates a Stream of the entry of the map, sorts it in reverse order of the value and collects that into a LinkedHashMap to keep the encounter order.

Output:

{o'clock=3, rock=1, one=1, three=1, two=1}

Alternatively, you might look into the StreamEx library, for which you could have:

Map<String, Long> bag =
    StreamEx.split("one o'clock two o'clock three o'clock rock", " ")
            .sorted()
            .runLengths()
            .reverseSorted(Map.Entry.comparingByValue())
            .toCustomMap(LinkedHashMap::new);

This code sorts each String and then calls runLengths(). This method will collapse adjacent equal elements into a Stream<String, Long> where the value is the number of times the elements appeared. For example, on the Stream ["foo", "foo", "bar"], this method would produce the Stream [Entry("foo", 2), Entry("bar", 1)]. Finally, this is sorted in descending order of the values and collected into a LinkedHashMap.

Note that this gives the correct result without having to do 2 distinct Stream pipelines.

like image 169
Tunaki Avatar answered Nov 07 '22 10:11

Tunaki


If you're open to using a third party library which has a Bag type built in then you can do the following using Eclipse Collections:

Bag<String> bag =
    Bags.mutable.with("one o'clock two o'clock three o'clock rock".split(" "));
ListIterable<ObjectIntPair<String>> pairs = bag.topOccurrences(bag.sizeDistinct());
Assert.assertEquals(PrimitiveTuples.pair("o'clock", 3), pairs.getFirst());
Assert.assertEquals(PrimitiveTuples.pair("rock", 1), pairs.getLast());
System.out.println(pairs);

The output of this is:

[o'clock:3, two:1, one:1, three:1, rock:1]

While the value of the orders is sorted, when there are ties, there is no predictable order for the keys. If you would like to have a predictable order for the keys, you can use a SortedBag instead.

Bag<String> bag =
    SortedBags.mutable.with("one o'clock two o'clock three o'clock rock".split(" "));
ListIterable<ObjectIntPair<String>> pairs = bag.topOccurrences(bag.sizeDistinct());
Assert.assertEquals(PrimitiveTuples.pair("o'clock", 3), pairs.getFirst());
Assert.assertEquals(PrimitiveTuples.pair("two", 1), pairs.getLast());
System.out.println(pairs);

The output of this is:

[o'clock:3, one:1, rock:1, three:1, two:1]

If you want to use the Pattern.splitAsStream as Brian suggested, then you can change the code as follows to work with Streams using Collector.toCollection:

Bag<String> bag =
    Pattern.compile(" ").splitAsStream("one o'clock two o'clock three o'clock rock")
        .collect(Collectors.toCollection(TreeBag::new));
ListIterable<ObjectIntPair<String>> pairs = bag.topOccurrences(bag.sizeDistinct());
Assert.assertEquals(PrimitiveTuples.pair("o'clock", 3), pairs.getFirst());
Assert.assertEquals(PrimitiveTuples.pair("two", 1), pairs.getLast());
System.out.println(pairs);

Note: I am a committer for Eclipse Collections.

like image 37
Donald Raab Avatar answered Nov 07 '22 09:11

Donald Raab