Java Streams: How to do an efficient "distinct and sort"?

Tags:

Let's assume I've got a Stream<T> and want to get only distinct elements and sorted.

The naïve approach would be to do just the following:

Stream.of(...)     .sorted()     .distinct()

or, maybe the other way around:

Stream.of(...)     .distinct()     .sorted()

Since the implementation of both of them is not really accessible by the JDK's source code I was just wondering about possible memory consumption and performance implications.

Or would it be even more efficient to write my own filter as the following?

Stream.of(...)     .sorted()     .filter(noAdjacentDuplicatesFilter())  public static Predicate<Object> noAdjacentDuplicatesFilter() {     final Object[] previousValue = {new Object()};      return value -> {         final boolean takeValue = !Objects.equals(previousValue[0], value);         previousValue[0] = value;         return takeValue;     }; }

880

asked May 05 '17 13:05

Michaela Maura Elschner

1 Answers

When you chain a distinct() operation after sorted(), the implementation will utilize the sorted nature of the data and avoid building an internal HashSet, which can be demonstrated by the following program

public class DistinctAndSort {     static int COMPARE, EQUALS, HASHCODE;     static class Tracker implements Comparable<Tracker> {         static int SERIAL;         int id;         Tracker() {             id=SERIAL++/2;         }         public int compareTo(Tracker o) {             COMPARE++;             return Integer.compare(id, o.id);         }         public int hashCode() {             HASHCODE++;             return id;         }         public boolean equals(Object obj) {             EQUALS++;             return super.equals(obj);         }     }     public static void main(String[] args) {         System.out.println("adjacent sorted() and distinct()");         Stream.generate(Tracker::new).limit(100)               .sorted().distinct()               .forEachOrdered(o -> {});         System.out.printf("compareTo: %d, EQUALS: %d, HASHCODE: %d%n",                           COMPARE, EQUALS, HASHCODE);         COMPARE=EQUALS=HASHCODE=0;         System.out.println("now with intermediate operation");         Stream.generate(Tracker::new).limit(100)             .sorted().map(x -> x).distinct()             .forEachOrdered(o -> {});         System.out.printf("compareTo: %d, EQUALS: %d, HASHCODE: %d%n",                           COMPARE, EQUALS, HASHCODE);     } }

which will print

adjacent sorted() and distinct() compareTo: 99, EQUALS: 99, HASHCODE: 0 now with intermediate operation compareTo: 99, EQUALS: 100, HASHCODE: 200

The intermediate operation, as simple as map(x -> x), can’t be recognized by the Stream implementation, hence, it must assume that the elements might not be sorted in respect to the mapping function’s result.

There is no guaranty that this kind of optimization happens, however, it is reasonable to assume that the developers of the Stream implementation will not remove that optimization and even try to add more optimizations, so rolling your own implementation will prevent your code from benefiting from future optimizations.

Further, what you have created is a “stateful predicate”, which is strongly discouraged, and, of course, will break when being used with a parallel stream.

If you don’t trust the Stream API to perform this operation efficiently enough, you might be better off implementing this particular operation without the Stream API.

174

answered Sep 22 '22 01:09

Holger

Related questions
                            
                                Where to find Java 6 JSSE/JCE Source Code?
                            
                                How to get maven to timeout earlier while downloading dependencies?
                            
                                Need memory efficient way to store tons of strings (was: HAT-Trie implementation in java)
                            
                                Java GPU programming [closed]
                            
                                Java variable placed on stack or heap
                            
                                How to handle database migrations in Spring Boot with Hibernate?
                            
                                Creating Java Web Service using Google AppEngine
                            
                                What is a good Java library for Parts-Of-Speech tagging? [closed]
                            
                                Android: CursorLoader, LoaderManager, SQLite
                            
                                Why does the Java increment operator allow narrowing operations without explicit cast? [duplicate]
                            
                                Java 8 Streams: Map the same object multiple times based on different properties
                            
                                @Id @GeneratedValue but set own ID value
                            
                                Using multiple cores/processors when compiling Java
                            
                                Only want to ever see Allman style code and save K&R style code
                            
                                Gson: How to change output of Enum
                            
                                Tomcat startup fails due to 'java.net.SocketException Invalid argument' on Mac OS X
                            
                                What does setDefaultMaxPerRoute and setMaxTotal mean in HttpClient?
                            
                                (class file version 53.0), Java Runtime versions up to 52.0 Visual studio code [duplicate]
                            
                                How to register multiple servlets in web.xml in one Spring application
                            
                                Spring OAuth Authorization Server behind Spring Cloud Zuul Proxy

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Java Streams: How to do an efficient "distinct and sort"?

Tags:

java

performance

java-8

java-stream

Michaela Maura Elschner

People also ask

1 Answers

Holger

Recent Activity

Donate For Us