Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get object with max frequency from Java 8 stream

I have an object with city and zip fields, let's call it Record.

public class Record() {
    private String zip;
    private String city;

    //getters and setters
}

Now, I have a collection of these objects, and I group them by zip using the following code:

final Collection<Record> records; //populated collection of records
final Map<String, List<Record>> recordsByZip = records.stream()
    .collect(Collectors.groupingBy(Record::getZip));

So, now I have a map where the key is the zip and the value is a list of Record objects with that zip.

What I want to get now is the most common city for each zip.

recordsByZip.forEach((zip, records) -> {
    final String mostCommonCity = //get most common city for these records
});

I would like to do this with all stream operations. For example, I am able to get a map of the frequency for each city by doing this:

recordsByZip.forEach((zip, entries) -> {
    final Map<String, Long> frequencyMap = entries.stream()
        .map(GisSectorFileRecord::getCity)
        .filter(StringUtils::isNotBlank)
        .collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));
});

But I would like to be able to do a single-line stream operation that will just return the most frequent city.

Are there any Java 8 stream gurus out there that can work some magic on this?

Here is an ideone sandbox if you'd like to play around with it.

like image 657
Andrew Mairose Avatar asked Sep 16 '16 18:09

Andrew Mairose


2 Answers

You could have the following:

final Map<String, String> mostFrequentCities =
  records.stream()
         .collect(Collectors.groupingBy(
            Record::getZip,
            Collectors.collectingAndThen(
              Collectors.groupingBy(Record::getCity, Collectors.counting()),
              map -> map.entrySet().stream().max(Map.Entry.comparingByValue()).get().getKey()
            )
         ));

This groups each records by their zip, and by their cities, counting the number of cities for each zip. Then, the map of the number of cities by zip is post-processed to keep only the city having the maximum count.

like image 184
Tunaki Avatar answered Oct 19 '22 07:10

Tunaki


I think Multiset is a good choice for this kind of question. Here is code by abacus-util

Stream.of(records).map(e -> e.getCity()).filter(N::notNullOrEmpty)
      .toMultiset().maxOccurrences().get().getKey();

Disclosure: I'm the developer of AbacusUtil.

like image 1
user_3380739 Avatar answered Oct 19 '22 09:10

user_3380739