Since I use streams a great deal, some of them dealing with a large amount of data, I thought it would be a good idea to pre-allocate my collection-based collectors with an approximate size to prevent expensive reallocation as the collection grows. So I came up with this, and similar ones for other collection types:
public static <T> Collector<T, ?, Set<T>> toSetSized(int initialCapacity) {
return Collectors.toCollection(()-> new HashSet<>(initialCapacity));
}
Used like this
Set<Foo> fooSet = myFooStream.collect(toSetSized(100000));
My concern is that the implementation of Collectors.toSet()
sets a Characteristics
enum that Collectors.toCollection()
does not: Characteristics.UNORDERED
. There is no convenient variation of Collectors.toCollection()
to set the desired characteristics beyond the default, and I can't copy the implementation of Collectors.toSet()
because of visibility issues. So, to set the UNORDERED
characteristic I'm forced to do something like this:
static<T> Collector<T,?,Set<T>> toSetSized(int initialCapacity){
return Collector.of(
() -> new HashSet<>(initialCapacity),
Set::add,
(c1, c2) -> {
c1.addAll(c2);
return c1;
},
new Collector.Characteristics[]{IDENTITY_FINISH, UNORDERED});
}
So here are my questions:
1. Is this my only option for creating an unordered collector for something as simple as a custom toSet()
2. If I want this to work ideally, is it necessary to apply the unordered characteristic? I've read a question on this forum where I learned that the unordered characteristic is no longer back-propagated into the Stream. Does it still serve a purpose?
The characteristics() is a method of Java Interface Spliterator which is used to get a set of characteristics of this Spliterator and its elements.
Class Collectors. Implementations of Collector that implement various useful reduction operations, such as accumulating elements into collections, summarizing elements according to various criteria, etc.
public interface Collector<T,A,R> A mutable reduction operation that accumulates input elements into a mutable result container, optionally transforming the accumulated result into a final representation after all input elements have been processed.
First of all, the UNORDERED
characteristic of a Collector
is there to aid performance and nothing else. There is nothing wrong with a Collector
not having that characteristic but not depending on the encounter order.
Whether this characteristic has an impact depends on the stream operations itself and implementation details. While the current implementation may not drain much advantage from it, due to the difficulties with the back-propagation, it doesn’t imply that future versions won’t. Of course, a stream which is already unordered, is not affected by the UNORDERED
characteristic of the Collector
. And not all stream operations have potential to benefit from it.
So the more important question is how important is it not to prevent such potential optimizations (perhaps in the future).
Note that there are other unspecified implementation details, affecting the potential optimizations when it comes to your second variant. The toCollection(Supplier)
collector has unspecified inner workings and only guarantees to provide a final result of the type produced by the Supplier
. In contrast, Collector.of(() -> new HashSet<>(initialCapacity), Set::add, (c1, c2) -> { c1.addAll(c2); return c1; }, IDENTITY_FINISH, UNORDERED)
defines precisely how the collector ought to work and may also hinder internal optimizations of collection producing collectors of future versions.
So a way to specify the characteristics without touching the other aspects of a Collector
would be the best solution, but as far as I know, there is no simple way offered by the existing API. But it’s easy to build such a facility yourself:
public static <T,A,R> Collector<T,A,R> characteristics(
Collector<T,A,R> c, Collector.Characteristics... ch) {
Set<Collector.Characteristics> o = c.characteristics();
if(!o.isEmpty()) {
o=EnumSet.copyOf(o);
Collections.addAll(o, ch);
ch=o.toArray(ch);
}
return Collector.of(c.supplier(), c.accumulator(), c.combiner(), c.finisher(), ch);
}
with that method, it’s easy to say, e.g.
HashSet<String> set=stream
.collect(characteristics(toCollection(()->new HashSet<>(capacity)), UNORDERED));
or provide your factory method
public static <T> Collector<T, ?, Set<T>> toSetSized(int initialCapacity) {
return characteristics(toCollection(()-> new HashSet<>(initialCapacity)), UNORDERED);
}
This limits the effort necessary to provide your characteristics (if it is a recurring problem), so it won’t hurt to provide them, even if you don’t know how much impact it will have.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With