I can collect a list of words into a bag (a.k.a. multi-set):
Map<String, Long> bag =
Arrays.asList("one o'clock two o'clock three o'clock rock".split(" "))
.stream()
.collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));
However, the entries of the bag are not guaranteed to be in any particular order. For example,
{rock=1, o'clock=3, one=1, three=1, two=1}
I can put them into a list, and then sort them using my implementation of a value comparator:
ArrayList<Entry<String, Long>> list = new ArrayList<>(bag.entrySet());
Comparator<Entry<String, Long>> valueComparator = new Comparator<Entry<String, Long>>() {
@Override
public int compare(Entry<String, Long> e1, Entry<String, Long> e2) {
return e2.getValue().compareTo(e1.getValue());
}
};
Collections.sort(list, valueComparator);
This gives the desired result:
[o'clock=3, rock=1, one=1, three=1, two=1]
Is there a more elegant way to do this? I'm sure it's a problem many people must have solved. Is there something built into Java Streams API that I can use?
You don't need to create a comparator, there is already one for this task: Map.Entry.comparingByValue
. This creates a comparator that compares values of entry of a map. In this case, we are interested in their reverse order so we could have:
Map.Entry.comparingByValue(Comparator.reverseOrder())
as the comparator. Your code could then become
Collections.sort(list, Map.Entry.comparingByValue(Comparator.reverseOrder()));
without having the custom comparator.
To sort the resulting Map
regarding its values, you could also use a Stream pipeline. Also, instead of calling Stream.of(Arrays.asList("...").split(" "))
, you may want to call Pattern.compile(" ").splitAsStream("...")
if you have long Strings to handle.
Map<String, Long> bag =
Pattern.compile(" ")
.splitAsStream("one o'clock two o'clock three o'clock rock")
.collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));
Map<String, Long> sortedBag =
bag.entrySet()
.stream()
.sorted(Map.Entry.comparingByValue(Comparator.reverseOrder()))
.collect(Collectors.toMap(
Map.Entry::getKey,
Map.Entry::getValue,
(v1, v2) -> { throw new IllegalStateException(); },
LinkedHashMap::new
));
This code creates a Stream of the entry of the map, sorts it in reverse order of the value and collects that into a LinkedHashMap
to keep the encounter order.
Output:
{o'clock=3, rock=1, one=1, three=1, two=1}
Alternatively, you might look into the StreamEx library, for which you could have:
Map<String, Long> bag =
StreamEx.split("one o'clock two o'clock three o'clock rock", " ")
.sorted()
.runLengths()
.reverseSorted(Map.Entry.comparingByValue())
.toCustomMap(LinkedHashMap::new);
This code sorts each String and then calls runLengths()
. This method will collapse adjacent equal elements into a Stream<String, Long>
where the value is the number of times the elements appeared. For example, on the Stream ["foo", "foo", "bar"]
, this method would produce the Stream [Entry("foo", 2), Entry("bar", 1)]
. Finally, this is sorted in descending order of the values and collected into a LinkedHashMap
.
Note that this gives the correct result without having to do 2 distinct Stream pipelines.
If you're open to using a third party library which has a Bag
type built in then you can do the following using Eclipse Collections:
Bag<String> bag =
Bags.mutable.with("one o'clock two o'clock three o'clock rock".split(" "));
ListIterable<ObjectIntPair<String>> pairs = bag.topOccurrences(bag.sizeDistinct());
Assert.assertEquals(PrimitiveTuples.pair("o'clock", 3), pairs.getFirst());
Assert.assertEquals(PrimitiveTuples.pair("rock", 1), pairs.getLast());
System.out.println(pairs);
The output of this is:
[o'clock:3, two:1, one:1, three:1, rock:1]
While the value of the orders is sorted, when there are ties, there is no predictable order for the keys. If you would like to have a predictable order for the keys, you can use a SortedBag
instead.
Bag<String> bag =
SortedBags.mutable.with("one o'clock two o'clock three o'clock rock".split(" "));
ListIterable<ObjectIntPair<String>> pairs = bag.topOccurrences(bag.sizeDistinct());
Assert.assertEquals(PrimitiveTuples.pair("o'clock", 3), pairs.getFirst());
Assert.assertEquals(PrimitiveTuples.pair("two", 1), pairs.getLast());
System.out.println(pairs);
The output of this is:
[o'clock:3, one:1, rock:1, three:1, two:1]
If you want to use the Pattern.splitAsStream as Brian suggested, then you can change the code as follows to work with Streams using Collector.toCollection
:
Bag<String> bag =
Pattern.compile(" ").splitAsStream("one o'clock two o'clock three o'clock rock")
.collect(Collectors.toCollection(TreeBag::new));
ListIterable<ObjectIntPair<String>> pairs = bag.topOccurrences(bag.sizeDistinct());
Assert.assertEquals(PrimitiveTuples.pair("o'clock", 3), pairs.getFirst());
Assert.assertEquals(PrimitiveTuples.pair("two", 1), pairs.getLast());
System.out.println(pairs);
Note: I am a committer for Eclipse Collections.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With