Which way is the most efficient to process a distinct and sorted collection?
Set<MyObj> ret = new TreeSet<>();
for (Foo foo : foos)
ret.add(new MyObj(foo));
List<MyObj> ret = foos.stream().map(MyObj::new)
.distinct().sorted()
.collect(Collectors.toList());
Set<MyObj> ret = foos.stream().map(MyObj::new)
.collect(Collectors.toCollection(TreeSet::new));
The first way seems to be the least elegant yet easy to read.
The second way makes me fear that distinct
and sorted
will process the stream two times.
The last way feels okay, but what is the TreeSet overhead in a stream?
Any clues? Thank you.
In Java8 Streams, performance is achieved by parallelism, laziness, and using short-circuit operations, but there is a downside as well, and we need to be very cautious while choosing Streams, as it may degrade the performance of your application.
Yes, streams are sometimes slower than loops, but they can also be equally fast; it depends on the circumstances. The point to take home is that sequential streams are no faster than loops.
For this particular test, streams are about twice as slow as collections, and parallelism doesn't help (or either I'm using it the wrong way?).
Judging from the Stream API source code, my initial guess would be: for many items simple stream (2) should be the fastest, outperforming significantly the TreeSet version (1), then TreeSet stream (3) should follow a little bit behind. For short data sets (1) would probably be better than (2) which is better than (3), because Stream creation adds some overhead. The distinct-sorted stream works roughly like this:
Set<MyObj> set = new HashSet<>();
List<MyObj> result = new ArrayList<>();
for (Foo foo : foos) {
MyObj myObj = new MyObj(foo);
if(set.add(myObj))
result.add(myObj);
}
result.sort(null);
return result;
Let's add this implementation as (4). It uses HashSet
to check whether results are distinct, adding them into intermediate container, then sorts it. This should be faster than maintaining TreeSet
as we don't need to keep order after every insertion (which TreeSet
does, possibly rebalancing the tree). Actual Stream implementation would be somewhat less efficient, because it cannot sort the resulting list in-place. Instead it creates intermediate container, sorts it, then dumps the result into the final list using series of list.add
calls.
The result could depend on number of elements in initial foos
collection and also on number of distinct elements. I call it diversity: diversity = 1 means that roughly every element is different; diversity = 0.5 means that every element is repeated roughly two times. Also the result may heavily depend on initial element order: sorting algorithms could be order of magnitude faster when input data is presorted or nearly presorted.
So let's parameterize our tests in the following way:
foos
): 10, 1000, 100000I assume that Foo
contains only one int
field. Of course the result might heavily depend on compareTo
, equals
and hashCode
implementation of Foo
class, because versions (2) and (4) use equals
and hashCode
while versions (1) and (3) use compareTo
. We will do it simply:
@Override
public int hashCode() {
return x;
}
@Override
public boolean equals(Object o) {
return this == o || (o != null && getClass() == o.getClass() && x == ((Foo) o).x);
}
@Override
public int compareTo(Foo o) {
return Integer.compare(x, o.x);
}
Presorted elements could be generated via:
foos = IntStream.range(0, size)
.mapToObj(x -> new Foo((int)(x*diversity)))
.collect(Collectors.toList());
Random elements could be generated via:
foos = new Random().ints(size, 0, (int) (size * diversity))
.mapToObj(Foo::new)
.collect(Collectors.toList());
Using JMH 1.13 and JDK 1.8.0_101, VM 25.101-b13 64bit to perform measurements
Presorted (all times are in μs):
diversity size (1) (2) (3) (4)
1 10 0.2 0.5 0.3 0.2
1 1000 48.0 36.0 53.0 24.2
1 100000 14165.7 4759.0 15177.3 3341.6
0.5 10 0.2 0.3 0.2 0.2
0.5 1000 36.9 23.1 41.6 20.1
0.5 100000 11442.1 2819.2 12508.7 2661.3
0.2 10 0.1 0.3 0.2 0.2
0.2 1000 32.0 13.0 29.8 16.7
0.2 100000 8491.6 1969.5 8971.9 1951.7
Not presorted:
diversity size (1) (2) (3) (4)
1 10 0.2 0.4 0.2 0.3
1 1000 72.8 77.4 73.6 72.7
1 100000 21599.9 16427.1 22807.8 16322.2
0.5 10 0.2 0.3 0.2 0.2
0.5 1000 64.8 46.9 69.4 45.5
0.5 100000 20335.2 11190.3 20658.6 10806.7
0.2 10 0.1 0.3 0.2 0.2
0.2 1000 48.0 19.6 56.7 22.2
0.2 100000 16713.0 5533.4 16885.0 5930.6
My initial guesses were in general correct. For presorted data (2) and (4) are times better when we have 100,000 elements. The difference becomes even bigger when we have many duplicates, as they don't increase sorting time and duplicate insertion to HashSet
is much more efficient than duplicate insertion to TreeSet
. For random data the difference is less striking as TreeSet
performance much less depend on the input data order than TimSort algorithm (which is used by Java to sort lists and arrays). For small data sets simple TreeSet
is fast, but using (4) version could be competitive as well.
A source code of benchmark along with raw results is available here.
It is hard to give a good answer without analyzing the input. Anyways I'll share my results:
I made Foo
a container for a single long
, and MyObj
a container for single Foo
. Also I made all tests end with copying data into a plain array. Also I've added two approaches:
4). Simple array
@Benchmark
public void simpleArray(Blackhole bh) {
MyObj[] ret = new MyObj[foos.size()];
for (int i=0;i<foos.size();i++)
ret[i] = new MyObj(foos.get(i));
Arrays.sort(ret);
int lastDistinct = 0;
for (int i = 1; i < ret.length; i++) {
if (ret[i].equals(ret[lastDistinct])) {
continue;
}
lastDistinct++;
ret[lastDistinct] = ret[i];
}
MyObj[] ret2 = new MyObj[lastDistinct + 1];
System.arraycopy(ret, 0, ret2, 0, lastDistinct + 1);
bh.consume(ret2);
}
5). Reversed order of distinct
and order
of (2):
@Benchmark
public void simpleStream_distinctAfterSort(Blackhole bh) {
List<MyObj> ret = foos.stream().map(MyObj::new)
.sorted().distinct()
.collect(Collectors.toList());
bh.consume(ret.toArray(new MyObj[ret.size()]));
}
Tests setup:
public static final int MAX_SIZE = 10_000;
public static final long ELEM_THRESHOLD = MAX_SIZE * 10;
private List<Foo> foos;
@Setup
public void init() throws IOException, IllegalAccessException, InstantiationException {
foos = new ArrayList<>(MAX_SIZE);
for (int i = 0; i < MAX_SIZE; i++) {
foos.add(new Foo(ThreadLocalRandom.current().nextLong(ELEM_THRESHOLD)));
}
}
Now results with different size and threshold:
Size=10_000
Threshold=Size*10
Benchmark Mode Cnt Score Error Units
StreamBenchmark5.enhancedLoop_TreeSet thrpt 2 478,978 ops/s
StreamBenchmark5.simpleArray thrpt 2 591,287 ops/s
StreamBenchmark5.simpleStream thrpt 2 407,556 ops/s
StreamBenchmark5.simpleStream_distinctAfterSort thrpt 2 533,091 ops/s
StreamBenchmark5.treeSetStream thrpt 2 492,709 ops/s
Size=10_000
Threshold=Size/10
StreamBenchmark5.enhancedLoop_TreeSet thrpt 2 937,908 ops/s
StreamBenchmark5.simpleArray thrpt 2 593,983 ops/s
StreamBenchmark5.simpleStream thrpt 2 3344,508 ops/s
StreamBenchmark5.simpleStream_distinctAfterSort thrpt 2 560,652 ops/s
StreamBenchmark5.treeSetStream thrpt 2 1000,585 ops/s
Size=500_000
Threshold=Size*10
Benchmark Mode Cnt Score Error Units
StreamBenchmark5.enhancedLoop_TreeSet thrpt 2 1,809 ops/s
StreamBenchmark5.simpleArray thrpt 2 4,009 ops/s
StreamBenchmark5.simpleStream thrpt 2 2,443 ops/s
StreamBenchmark5.simpleStream_distinctAfterSort thrpt 2 4,141 ops/s
StreamBenchmark5.treeSetStream thrpt 2 2,040 ops/s
Size=500_000
Threshold=Size/10
Benchmark Mode Cnt Score Error Units
StreamBenchmark5.enhancedLoop_TreeSet thrpt 2 5,724 ops/s
StreamBenchmark5.simpleArray thrpt 2 4,567 ops/s
StreamBenchmark5.simpleStream thrpt 2 19,001 ops/s
StreamBenchmark5.simpleStream_distinctAfterSort thrpt 2 4,840 ops/s
StreamBenchmark5.treeSetStream thrpt 2 5,407 ops/s
Size=1_000_000
Threshold=Size/100
Benchmark Mode Cnt Score Error Units
StreamBenchmark5.enhancedLoop_TreeSet thrpt 2 4,529 ops/s
StreamBenchmark5.simpleArray thrpt 2 2,402 ops/s
StreamBenchmark5.simpleStream thrpt 2 35,699 ops/s
StreamBenchmark5.simpleStream_distinctAfterSort thrpt 2 2,232 ops/s
StreamBenchmark5.treeSetStream thrpt 2 4,889 ops/s
As you can see depending on the amount of duplicates preferable algorithm changes. The most balanced approach is TreeSet
(3), however the fastest one is almost always the simple stream (with order
and distinct
located corresponding to input data).
Here's the test source if you are willing to play a bit. You'll need JMH.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With