Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using the Java 8 Streams API, can sorted() be relied upon when calling Collectors.toSet()?

This is the implementation of the java.util.stream.Collectors class's toSet() method:

public static <T>
Collector<T, ?, Set<T>> toSet() {
    return new CollectorImpl<>((Supplier<Set<T>>) HashSet::new, Set::add,
                               (left, right) -> { left.addAll(right); return left; },
                               CH_UNORDERED_ID);
}

As we can see, it uses a HashSet and calls add. From the HashSet documentation, "It makes no guarantees as to the iteration order of the set; in particular, it does not guarantee that the order will remain constant over time."

In the following code, a List of String is streamed, sorted and collected into a Set:

public static void main(String[] args) {
    Set<String> strings = Arrays.asList("c", "a", "b")
            .stream()
            .sorted()
            .collect(Collectors.toSet());
    System.out.println(strings.getClass());
    System.out.println(strings);
}

This provides the output:

class java.util.HashSet

[a, b, c]

The output is sorted. What I think is happening here is that although the contract provided by the HashSet documentation specifies that ordering is not something it provides, the implementation happens to add in order. I suppose this could change in future versions / vary between JVMs and that a wiser approach would be to do something like Collectors.toCollection(TreeSet::new).

Can sorted() be relied upon when calling Collectors.toSet()?

Additionally, what exactly does "it does not guarantee that the order will remain constant over time" mean? (I suppose add, remove, the resizing of the underlying array?)

like image 465
Robert Bain Avatar asked Oct 20 '17 21:10

Robert Bain


People also ask

What are the stream API methods in Java 8?

In this tutorial, we will explore the Stream API methods: sorted (), count (), and distinct () methods introduced in Java 8. 1. Introduction Before diving deep into the practice stuff let us understand the methods we will be covering in this tutorial.

How to sort a stream in Java 8?

On this page we will provide java 8 Stream sorted () example. We can sort the stream in natural ordering as well as ordering provided by Comparator. In java 8 Comparator can be instantiated using lambda expression. We can also reverse the natural ordering as well as ordering provided by Comparator.

What is Java stream collect() used for?

Java Stream collect () is mostly used to collect the stream elements to a collection. It’s a terminal operation. It takes care of synchronization when used with a parallel stream. The Collectors class provides a lot of Collector implementation to help us out. Want to learn more? Join the DigitalOcean Community!

What is the syntax of sorted() method in Java?

Find the syntax of sorted () method. 1. sorted (): It sorts the elements of stream using natural ordering. The element class must implement Comparable interface. 2. sorted (Comparator<? super T> comparator): Here we create an instance of Comparator using lambda expression. We can sort the stream elements in ascending and descending order.


2 Answers

To answer that question, you have to know a bit about how HashSet is implemented. As the name suggests, a HashSet is implemented using a hash table. Basically, a hash table is an array that is indexed by element hashes. A hash function (in Java, an object's hash is calculated by object.hashCode()) is basically a function that meets a few criteria:

  • it is (relatively) quick to compute for a given element
  • two objects that .equals() each other have identical hashes
  • there is a low probability that different items have the same hash

So, when you meed a HashSet that is "sorted" (which is understood as "the iterator preserves the natural order of elements"), this is due to a couple of coincidences:

  • the natural order of elements respects the natural order of their hashCodes
  • the hash table is small enough not to have collisions (two elements with the same hash code)

If you look into the String class hashCode() method, you will see that for one-letter strings, the hash code corresponds to the Unicode index (codepoint) of the letter - so in this specific case, as long as the hash table is small enough, the elements will be sorted. However, this is a huge coincidence and

  • will not hold for any other sort order
  • will not hold for classes whose hashCodes do not follow their natural ordering
  • will not hold hashtables with collisions

and moreover, this has nothing to do with the fact that sorted() was called on the stream - it's simply due to the way hashCode() is implemented and therefore the ordering of the hash table. Therefore, the simple answer to the question is "no".

like image 194
Piotr Wilkin Avatar answered Nov 18 '22 13:11

Piotr Wilkin


The answer is no. Once you added the items into a Set you cannot rely on any order. From JDK sourcecode (HashSet.java):

/**
 * Returns an iterator over the elements in this set.  The elements
 * are returned in no particular order.
 *
 * @return an Iterator over the elements in this set
 * @see ConcurrentModificationException
 */
public Iterator<E> iterator() {
    return map.keySet().iterator();
}

Now, in previous versions of the JDK even though an order wasn't guaranteed, you'd usually get the items in the same order of insertion (unless the class of the objects implements hashCode() and then you'll get the order that is dictated by hashCode()). either the order of creation of the objects or the order of invocation of hashCode() on the objects. As @Holgar mentions in the comments below, in HotSpot it's the latter. And you can't even count on that since there are exceptions to this as well since the sequential number is not the only ingredient in the hashCode generator.

I recently heard a talk from Stuart Marks (the guy who's responsible for a re-write of a major part of Collections in Java 9) and he said that they've added randomization to the iteration order of Sets (created by new set-factories) in Java 9. If you want to hear the session, the part that he talk about sets start here - good talk, highly recommended by the way!.

So even if you used to count on iteration order of Sets, once you move to Java 9 you should stop doing so.

All that said, if you need order you should consider using a SortedSet, LinkedHashSet or TreeSet

like image 37
Nir Alfasi Avatar answered Nov 18 '22 13:11

Nir Alfasi