Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java 8 Stream Merge Partial Duplicates

I have a POJO that looks something like this:

public class Account {
    private Integer accountId;
    private List<String> contacts;
}

The equals And hashCode methods are set to use the accountId field to identify uniqueness, so any Accounts with the same accountId are equal regardless of what contacts contain.

I have a List of accounts and there are some duplicates with the same accountId. How do I use Java 8 Stream API to merge these duplicates together?

For example, the list of account contains:

+-----------+----------+
| accountId | contacts |
+-----------+----------+
|         1 | {"John"} |
|         1 | {"Fred"} |
|         2 | {"Mary"} |
+-----------+----------+

And I want it to produce a list of accounts like this:

+-----------+------------------+
| accountId |     contacts     |
+-----------+------------------+
|         1 | {"John", "Fred"} |
|         2 | {"Mary"}         |
+-----------+------------------+
like image 483
George Avatar asked Mar 07 '23 02:03

George


2 Answers

Use Collectors.toMap Ref: https://docs.oracle.com/javase/8/docs/api/java/util/stream/Collectors.html#toMap-java.util.function.Function-java.util.function.Function-java.util.function.BinaryOperator-

@lombok.Value
class Account {
    Integer accountId;
    List<String> contacts;
}

List<Account> accounts = new ArrayList<>();
//Fill
List<Account> result = new ArrayList<>(accounts.stream()
    .collect(
        Collectors.toMap(Account::getAccountId, Function.identity(), (Account account1, Account account2) -> {
            account1.getContacts().addAll(account2.getContacts());
            account2.getContacts().clear();
            return account1;
        })
    )
    .values());
like image 166
balki Avatar answered May 15 '23 22:05

balki


A clean Stream API solution can be quiet complicated, so perhaps you’re better off with a Collection API solution that has less constraints to obey.

HashMap<Integer, Account> tmp = new HashMap<>();
listOfAccounts.removeIf(a -> a != tmp.merge(a.getAccountId(), a, (o,n) -> {
    o.getContacts().addAll(n.getContacts());
    return o;
}));

This directly removes all elements with a duplicate id from the list after having added their contacts to the first account of that id.

Of course, this assumes that the list supports removal and the list returned by getContacts() is a reference to the stored list and supports adding elements.

The solution is built around Map.merge which will add the specified object if the key didn’t exist or evaluates the merge function if the key already existed. The merge function returns the old object after having added the contacts, so we can do a reference comparison (a != …) to determine that we have a duplicate that should be removed.

like image 36
Holger Avatar answered May 15 '23 23:05

Holger