If i have a list (~200 elements) of objects, with only few unique objects (~20 elements).
I want to have only unique values. Between list.stream().collect(Collectors.toSet()) and list.stream().distinct().collect(Collectors.toList())
which is more efficient wrt latency and memory consumption ?
The toList() method of Collectors Class is a static (class) method. It returns a Collector Interface that gathers the input data onto a new list. This method never guarantees type, mutability, serializability, or thread-safety of the returned list but for more control toCollection(Supplier) method can be used.
In Kotlin, we can use toSet() function available in Collection functions to remove duplicates. Note: Maintain the original order of items.
Java Stream distinct() method returns a new stream of distinct elements. It's useful in removing duplicate elements from the collection before processing them.
collect() is one of the Java 8's Stream API's terminal methods. It allows us to perform mutable fold operations (repackaging elements to some data structures and applying some additional logic, concatenating them, etc.) on data elements held in a Stream instance.
While the answer is pretty obvious - don't bother with these details of speed and memory consumption for this little amount of elements and the fact that one returns a Set
and the other a List
; there are some interesting small details (interesting IMO).
Suppose you are streaming from a source that is already known to be distinct
, in such a case your .distinct()
operation will be a NO-OP; because there is no need to actually do anything.
If you are streaming from a List
(which is by design ordered) and there are no intermediate operations (unordered
for example) that change the order, .distinct()
will be forced to preserve the order, by using a LinkedHashSet
internally - pretty expensive.
If you are doing parallel processing, list.stream().collect(Collectors.toSet())
version will merge multiple HashSet
s (in 9 this has been slightly improved vs 8), .distinct()
on the other hand, will spin a ConcurrentHashMap
that will keep all the keys with a dummy Boolean.TRUE
value (it's also doing something interesting to preserve the null
that your stream might have - even this internally is handled differently in two cases)
A Set
(typically HashSet
) consumes more than a List
(typically ArrayList
), mainly because of the hashing table that it stores. But with so few elements, you will not get a noticeable difference in terms of memory consumption.
Instead, which you should care about is that these collectors return different things : a List
and a Set
that have their own specificities, particularly as as you access to their elements.
So use the way that matches to what you want to perform with this collection.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With