Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Equivalent to stream distinct using a custom comparator

If I have the following List:

List<String> list = Arrays.asList("hello", "world", "hello");

And I apply the following (Java8):

list.stream().distinct().collect(Collectors.toString());

Then I would get a list with "hello" and "world".

However, in my case, I have a list of a type (from an external api) where I want to "bypass" the equals Method, ideally with a comparator, as it doesn't cover what I need.

Assume this class looks like this:

public class Point {
    float x;
    float y;
    //getters and setters omitted
}

In this case, I would like two points that cover a certain criteria to be defined as equal, for instance (30, 20) and (30.0001, 19.999).

A custom comparator could do the trick, but I have found no API that does what the distinct() in Java8 Stream does, but with a comparator (or similar pattern).

Any thoughts? I know I could write such a function, but I would rather like the elegant way of using existing apis... I have no restriction with external libraries (guava, apache-commons, etc. are welcome if they have a comfortable way of doing what I need).

like image 481
Martin Avatar asked Jan 11 '16 17:01

Martin


1 Answers

HashingStrategy is the concept you're looking for. It's a strategy interface that allows you to define custom implementations of equals and hashcode.

public interface HashingStrategy<E>
{
    int computeHashCode(E object);
    boolean equals(E object1, E object2);
}

Streams don't support hashing strategies but Eclipse Collections does. It has sets and maps that support hashing strategies as well as overloads of methods like distinct() that take hashing strategies.

This would work well for Strings. For example, here's how we could get all distinct Strings ignoring case.

MutableList<String> strings = Lists.mutable.with("Hello", "world", "HELLO", "World");
assertThat(
    strings.distinct(HashingStrategies.fromFunction(String::toLowerCase)),
    is(equalTo(Lists.immutable.with("Hello", "world"))));

Or you can write the hashing strategy by hand to avoid garbage creation.

HashingStrategy<String> caseInsensitive = new HashingStrategy<String>()
{
    @Override
    public int computeHashCode(String string)
    {
        int hashCode = 0;
        for (int i = 0; i < string.length(); i++)
        {
            hashCode = 31 * hashCode + Character.toLowerCase(string.charAt(i));
        }
        return hashCode;
    }

    @Override
    public boolean equals(String string1, String string2)
    {
        return string1.equalsIgnoreCase(string2);
    }
};

assertThat(
    strings.distinct(caseInsensitive),
    is(equalTo(Lists.immutable.with("Hello", "world"))));

This could work for Points too, but only if you can group all points within non-overlapping regions to have the same hashcode. If you're using a Comparator defined to return 0 when two Points are close enough, then you can run into transitivity problems. For example, Points A, B, and C can fall along a line with A and C both close to B but far from each other. Still, if this is a useful concept to you, we'd welcome a pull request adding ListIterable.distinct(Comparator) to the API.

Note: I am a committer for Eclipse Collections.

like image 71
Craig P. Motlin Avatar answered Oct 12 '22 23:10

Craig P. Motlin