I do have a simialar problem like descripted here. But with two differences first I do use the stream api and second I do have an <code>equals()</code> and <code>hashCode()</code> method already. But within the stream the equalitity of the of Blogs are in this context not the same as defined in the <code>Blog</code> class. <pre class="prettyprint"><code>Collection<Blog> elements = x.stream() ... // a lot of filter and map stuff .peek(p -> sysout(p)) // a stream of Blog .? // how to remove duplicates - .distinct() doesn't work </code></pre> I do have a class with an equal Method lets call it <code>ContextBlogEqual</code> with the method <pre class="prettyprint"><code>public boolean equal(Blog a, Blog b); </code></pre> Is there any way removing all duplicate entries with my current stream approach based on the <code>ContextBlogEqual#equal</code> method? I thought already on grouping, but this doesn't work either, because the reason why <code>blogA</code> and <code>blogB</code> is equal isn't just one parameter. Also I have no idea how I could use .reduce(..), because there is useally more than one element left.

In essence, you either have to define <code>hashCode</code> to make your data work with a hashtable, or a total order to make it work with a binary search tree. For hashtables you'll need to declare a wrapper class which will override <code>equals</code> and <code>hashCode</code>. For binary trees you can define a <code>Comparator<Blog></code> which respects your equality definition and adds an arbitrary, but consistent, ordering criterion. Then you can collect into a <code>new TreeSet<Blog>(yourComparator)</code>.

First, please note that <code>equal(Blog, Blog)</code> method is not enough for the most scenarios as you will need to pairwise compare all the entries which is not efficient. It's better to define the function which extracts new key from the blog entry. For example, let's consider the following <code>Blog</code> class: <pre class="prettyprint"><code>static class Blog { final String name; final int id; final long time; public Blog(String name, int id, long time) { this.name = name; this.id = id; this.time = time; } @Override public int hashCode() { return Objects.hash(name, id, time); } @Override public boolean equals(Object obj) { if (this == obj) return true; if (obj == null || getClass() != obj.getClass()) return false; Blog other = (Blog) obj; return id == other.id && time == other.time && Objects.equals(name, other.name); } public String toString() { return name+":"+id+":"+time; } } </code></pre> Let's have some test data: <pre class="prettyprint"><code>List<Blog> blogs = Arrays.asList(new Blog("foo", 1, 1234), new Blog("bar", 2, 1345), new Blog("foo", 1, 1345), new Blog("bar", 2, 1345)); List<Blog> distinctBlogs = blogs.stream().distinct().collect(Collectors.toList()); System.out.println(distinctBlogs); </code></pre> Here <code>distinctBlogs</code> contains three entries: <code>[foo:1:1234, bar:2:1345, foo:1:1345]</code>. Suppose that it's undesired, because we don't want to compare the <code>time</code> field. The simplest way to create new key is to use <code>Arrays.asList</code>: <pre class="prettyprint"><code>Function<Blog, Object> keyExtractor = b -> Arrays.asList(b.name, b.id); </code></pre> The resulting keys already have proper <code>equals</code> and <code>hashCode</code> implementations. Now if you fine with terminal operation, you may create a custom collector like this: <pre class="prettyprint"><code>List<Blog> distinctByNameId = blogs.stream().collect( Collectors.collectingAndThen(Collectors.toMap( keyExtractor, Function.identity(), (a, b) -> a, LinkedHashMap::new), map -> new ArrayList<>(map.values()))); System.out.println(distinctByNameId); </code></pre> Here we use <code>keyExtractor</code> to generate the keys and merge function is <code>(a, b) -> a</code> which means select the previously added entry when repeating key appears. We use <code>LinkedHashMap</code> to preserve the order (omit this parameter if you don't care about order). Finally we dump the map values into the new <code>ArrayList</code>. You can move such collector creation to the separate method and generalize it: <pre class="prettyprint"><code>public static <T> Collector<T, ?, List<T>> distinctBy( Function<? super T, ?> keyExtractor) { return Collectors.collectingAndThen( Collectors.toMap(keyExtractor, Function.identity(), (a, b) -> a, LinkedHashMap::new), map -> new ArrayList<>(map.values())); } </code></pre> This way the usage will be simpler: <pre class="prettyprint"><code>List<Blog> distinctByNameId = blogs.stream() .collect(distinctBy(b -> Arrays.asList(b.name, b.id))); </code></pre>

How to eliminate duplicate entries within a stream based on a own Equal class

Tags:

java

equality

java-8

java-stream

I do have a simialar problem like descripted here. But with two differences first I do use the stream api and second I do have an equals() and hashCode() method already. But within the stream the equalitity of the of Blogs are in this context not the same as defined in the Blog class.

Collection<Blog> elements = x.stream()
    ... // a lot of filter and map stuff
    .peek(p -> sysout(p)) // a stream of Blog
    .? // how to remove duplicates - .distinct() doesn't work

I do have a class with an equal Method lets call it ContextBlogEqual with the method

public boolean equal(Blog a, Blog b);

Is there any way removing all duplicate entries with my current stream approach based on the ContextBlogEqual#equal method?

I thought already on grouping, but this doesn't work either, because the reason why blogA and blogB is equal isn't just one parameter. Also I have no idea how I could use .reduce(..), because there is useally more than one element left.

796

asked Sep 03 '15 18:09

Christian

2 Answers

In essence, you either have to define hashCode to make your data work with a hashtable, or a total order to make it work with a binary search tree.

For hashtables you'll need to declare a wrapper class which will override equals and hashCode.

For binary trees you can define a Comparator<Blog> which respects your equality definition and adds an arbitrary, but consistent, ordering criterion. Then you can collect into a new TreeSet<Blog>(yourComparator).

answered Sep 28 '22 03:09

Marko Topolnik

First, please note that equal(Blog, Blog) method is not enough for the most scenarios as you will need to pairwise compare all the entries which is not efficient. It's better to define the function which extracts new key from the blog entry. For example, let's consider the following Blog class:

static class Blog {
    final String name;
    final int id;
    final long time;

    public Blog(String name, int id, long time) {
        this.name = name;
        this.id = id;
        this.time = time;
    }

    @Override
    public int hashCode() {
        return Objects.hash(name, id, time);
    }

    @Override
    public boolean equals(Object obj) {
        if (this == obj)
            return true;
        if (obj == null || getClass() != obj.getClass())
            return false;
        Blog other = (Blog) obj;
        return id == other.id && time == other.time && Objects.equals(name, other.name);
    }

    public String toString() {
        return name+":"+id+":"+time;
    }
}

Let's have some test data:

List<Blog> blogs = Arrays.asList(new Blog("foo", 1, 1234), 
        new Blog("bar", 2, 1345), new Blog("foo", 1, 1345), 
        new Blog("bar", 2, 1345));
List<Blog> distinctBlogs = blogs.stream().distinct().collect(Collectors.toList());
System.out.println(distinctBlogs);

Here distinctBlogs contains three entries: [foo:1:1234, bar:2:1345, foo:1:1345]. Suppose that it's undesired, because we don't want to compare the time field. The simplest way to create new key is to use Arrays.asList:

Function<Blog, Object> keyExtractor = b -> Arrays.asList(b.name, b.id);

The resulting keys already have proper equals and hashCode implementations.

Now if you fine with terminal operation, you may create a custom collector like this:

List<Blog> distinctByNameId = blogs.stream().collect(
        Collectors.collectingAndThen(Collectors.toMap(
                keyExtractor, Function.identity(), 
                (a, b) -> a, LinkedHashMap::new),
                map -> new ArrayList<>(map.values())));
System.out.println(distinctByNameId);

Here we use keyExtractor to generate the keys and merge function is (a, b) -> a which means select the previously added entry when repeating key appears. We use LinkedHashMap to preserve the order (omit this parameter if you don't care about order). Finally we dump the map values into the new ArrayList. You can move such collector creation to the separate method and generalize it:

public static <T> Collector<T, ?, List<T>> distinctBy(
        Function<? super T, ?> keyExtractor) {
    return Collectors.collectingAndThen(
        Collectors.toMap(keyExtractor, Function.identity(), (a, b) -> a, LinkedHashMap::new),
        map -> new ArrayList<>(map.values()));
}

This way the usage will be simpler:

List<Blog> distinctByNameId = blogs.stream()
           .collect(distinctBy(b -> Arrays.asList(b.name, b.id)));

answered Sep 28 '22 04:09

Tagir Valeev

Related questions
                            
                                How to get list of strings from json object
                            
                                parsing date from day of week with java datetimeformatter
                            
                                Java Multithreading priority: Why in this example, sometimes t1 occurs before t2 is completed, even if t2 has higher priority?
                            
                                Using method references with parameters
                            
                                GridBagLayout gridwidth doesn't work as expected
                            
                                Concurrent array access by executor service
                            
                                Using pdfbox in java to overlay text onto previously created pdf document
                            
                                Spark job running out of heap memory on takeSample
                            
                                Where exactly should I put ehcache.xml in my project?
                            
                                Where it is useful to have nested classes in an interface? [duplicate]
                            
                                List all Classes WITHOUT Javadocs
                            
                                EJB Pooling vs Thread-safe and @PreDestroy
                            
                                Does Apache Shiro support bCrypt?
                            
                                Is the char encoding same across programming languages?
                            
                                Java Regex Error - No group 1
                            
                                How to fetch website content and put it in the way we want into the android app layout?
                            
                                cannot resolve symbol 'MyGcmListenerService' and 'MyInstanceIDListenerService'
                            
                                Type inference in java
                            
                                Declaration of characters and Strings
                            
                                Custom exception class does not catch exceptions

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With