If i have a list (~200 elements) of objects, with only few unique objects (~20 elements). I want to have only unique values. Between <code>list.stream().collect(Collectors.toSet()) and list.stream().distinct().collect(Collectors.toList())</code> which is more efficient wrt latency and memory consumption ?

While the answer is pretty obvious - don't bother with these details of speed and memory consumption for this little amount of elements and the fact that one returns a <code>Set</code> and the other a <code>List</code>; there are some interesting small details (interesting IMO). Suppose you are streaming from a source that is already known to be <code>distinct</code>, in such a case your <code>.distinct()</code> operation will be a NO-OP; because there is no need to actually do anything. If you are streaming from a <code>List</code> (which is by design ordered) and there are no intermediate operations (<code>unordered</code> for example) that change the order, <code>.distinct()</code> will be forced to preserve the order, by using a <code>LinkedHashSet</code> internally - pretty expensive. If you are doing parallel processing, <code>list.stream().collect(Collectors.toSet())</code> version will merge multiple <code>HashSet</code>s (in 9 this has been slightly improved vs 8), <code>.distinct()</code> on the other hand, will spin a <code>ConcurrentHashMap</code> that will keep all the keys with a dummy <code>Boolean.TRUE</code> value (it's also doing something interesting to preserve the <code>null</code> that your stream might have - even this internally is handled differently in two cases)

A <code>Set</code> (typically <code>HashSet</code>) consumes more than a <code>List</code> (typically <code>ArrayList</code>), mainly because of the hashing table that it stores. But with so few elements, you will not get a noticeable difference in terms of memory consumption. Instead, which you should care about is that these collectors return different things : a <code>List</code> and a <code>Set</code> that have their own specificities, particularly as as you access to their elements. So use the way that matches to what you want to perform with this collection.

stream().collect(Collectors.toSet()) vs stream().distinct().collect(Collectors.toList())

Tags:

java-8

java-stream

If i have a list (~200 elements) of objects, with only few unique objects (~20 elements). I want to have only unique values. Between list.stream().collect(Collectors.toSet()) and list.stream().distinct().collect(Collectors.toList()) which is more efficient wrt latency and memory consumption ?

635

asked Feb 26 '18 17:02

Laxmikant

2 Answers

While the answer is pretty obvious - don't bother with these details of speed and memory consumption for this little amount of elements and the fact that one returns a Set and the other a List; there are some interesting small details (interesting IMO).

Suppose you are streaming from a source that is already known to be distinct, in such a case your .distinct() operation will be a NO-OP; because there is no need to actually do anything.

If you are streaming from a List (which is by design ordered) and there are no intermediate operations (unordered for example) that change the order, .distinct() will be forced to preserve the order, by using a LinkedHashSet internally - pretty expensive.

If you are doing parallel processing, list.stream().collect(Collectors.toSet()) version will merge multiple HashSets (in 9 this has been slightly improved vs 8), .distinct() on the other hand, will spin a ConcurrentHashMap that will keep all the keys with a dummy Boolean.TRUE value (it's also doing something interesting to preserve the null that your stream might have - even this internally is handled differently in two cases)

141

answered Jan 01 '23 21:01

Eugene

A Set (typically HashSet) consumes more than a List (typically ArrayList), mainly because of the hashing table that it stores. But with so few elements, you will not get a noticeable difference in terms of memory consumption.
Instead, which you should care about is that these collectors return different things : a List and a Set that have their own specificities, particularly as as you access to their elements.
So use the way that matches to what you want to perform with this collection.

answered Jan 01 '23 22:01

davidxxx

Related questions
                            
                                How specialized are the Stream implementations returned by the standard collections?
                            
                                Java Streams - Standard Deviation
                            
                                How to convert Array to HashMap using Java 8 Stream
                            
                                Extracting Map<K, Multiset<V>> from Stream of Streams in Java 8
                            
                                Is there a way to return comparator which wont do anything?
                            
                                How to use two filters in stream for different transformations
                            
                                Find next occurrence of a day-of-week in JSR-310
                            
                                Does SonarQube support Java 8 yet?
                            
                                Is it legal to put annotation after access modifier in Java 7? Or Java 8?
                            
                                IntelliJ: cannot find java.util.Optional
                            
                                Why iterator.forEachRemaining doesnt remove element in the Consumer lambda?
                            
                                java.util.concurrent.TimeUnit vs java.time.Duration for timeout arguments
                            
                                Is there way to use Java 8 features with Android library project?
                            
                                Lambda 'special void-compatibility rule' - statement expression
                            
                                Mockito returnsFirstArg() to use
                            
                                ConcurrentHashMap does not work as expected
                            
                                Why does converting Java Dates before 1582 to LocalDate with Instant give a different date?
                            
                                ceil conterpart for Math.floorDiv in Java?
                            
                                Java 8 stream join and return multiple values
                            
                                Compact a comma delimited number list into ranges

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With