Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

TreeSet vs Java 8 Streams performance

Which way is the most efficient to process a distinct and sorted collection?

1. Enhanced loop with TreeSet

Set<MyObj> ret = new TreeSet<>();
for (Foo foo : foos)
    ret.add(new MyObj(foo));

2. Simple Stream

List<MyObj> ret = foos.stream().map(MyObj::new)
                      .distinct().sorted()
                      .collect(Collectors.toList());

3. TreeSet Stream

Set<MyObj> ret = foos.stream().map(MyObj::new)
                     .collect(Collectors.toCollection(TreeSet::new));

The first way seems to be the least elegant yet easy to read. The second way makes me fear that distinct and sorted will process the stream two times. The last way feels okay, but what is the TreeSet overhead in a stream?

Any clues? Thank you.

like image 482
Guillaume F. Avatar asked Feb 15 '17 07:02

Guillaume F.


People also ask

Does Java 8 stream improve performance?

In Java8 Streams, performance is achieved by parallelism, laziness, and using short-circuit operations, but there is a downside as well, and we need to be very cautious while choosing Streams, as it may degrade the performance of your application.

Is Java 8 stream faster than for loop?

Yes, streams are sometimes slower than loops, but they can also be equally fast; it depends on the circumstances. The point to take home is that sequential streams are no faster than loops.

Which is faster stream or collection?

For this particular test, streams are about twice as slow as collections, and parallelism doesn't help (or either I'm using it the wrong way?).


2 Answers

Initial analysis

Judging from the Stream API source code, my initial guess would be: for many items simple stream (2) should be the fastest, outperforming significantly the TreeSet version (1), then TreeSet stream (3) should follow a little bit behind. For short data sets (1) would probably be better than (2) which is better than (3), because Stream creation adds some overhead. The distinct-sorted stream works roughly like this:

Set<MyObj> set = new HashSet<>();
List<MyObj> result = new ArrayList<>();
for (Foo foo : foos) {
    MyObj myObj = new MyObj(foo);
    if(set.add(myObj))
        result.add(myObj);
}
result.sort(null);
return result;

Let's add this implementation as (4). It uses HashSet to check whether results are distinct, adding them into intermediate container, then sorts it. This should be faster than maintaining TreeSet as we don't need to keep order after every insertion (which TreeSet does, possibly rebalancing the tree). Actual Stream implementation would be somewhat less efficient, because it cannot sort the resulting list in-place. Instead it creates intermediate container, sorts it, then dumps the result into the final list using series of list.add calls.

The result could depend on number of elements in initial foos collection and also on number of distinct elements. I call it diversity: diversity = 1 means that roughly every element is different; diversity = 0.5 means that every element is repeated roughly two times. Also the result may heavily depend on initial element order: sorting algorithms could be order of magnitude faster when input data is presorted or nearly presorted.

Experimental setup

So let's parameterize our tests in the following way:

  • size (number of elements in foos): 10, 1000, 100000
  • diversity (fraction of different ones): 1, 0.5, 0.2
  • presorted: true, false

I assume that Foo contains only one int field. Of course the result might heavily depend on compareTo, equals and hashCode implementation of Foo class, because versions (2) and (4) use equals and hashCode while versions (1) and (3) use compareTo. We will do it simply:

@Override
public int hashCode() {
    return x;
}

@Override
public boolean equals(Object o) {
    return this == o || (o != null && getClass() == o.getClass() && x == ((Foo) o).x);
}

@Override
public int compareTo(Foo o) {
    return Integer.compare(x, o.x);
}

Presorted elements could be generated via:

foos = IntStream.range(0, size)
                .mapToObj(x -> new Foo((int)(x*diversity)))
                .collect(Collectors.toList());

Random elements could be generated via:

foos = new Random().ints(size, 0, (int) (size * diversity))
                   .mapToObj(Foo::new)
                   .collect(Collectors.toList());

Using JMH 1.13 and JDK 1.8.0_101, VM 25.101-b13 64bit to perform measurements

Results

Presorted (all times are in μs):

diversity size      (1)      (2)      (3)      (4)
  1         10      0.2      0.5      0.3      0.2
  1       1000     48.0     36.0     53.0     24.2
  1     100000  14165.7   4759.0  15177.3   3341.6
0.5         10      0.2      0.3      0.2      0.2
0.5       1000     36.9     23.1     41.6     20.1
0.5     100000  11442.1   2819.2  12508.7   2661.3
0.2         10      0.1      0.3      0.2      0.2
0.2       1000     32.0     13.0     29.8     16.7
0.2     100000   8491.6   1969.5   8971.9   1951.7

Not presorted:

diversity size      (1)      (2)      (3)      (4)
  1         10      0.2      0.4      0.2      0.3
  1       1000     72.8     77.4     73.6     72.7
  1     100000  21599.9  16427.1  22807.8  16322.2
0.5         10      0.2      0.3      0.2      0.2
0.5       1000     64.8     46.9     69.4     45.5
0.5     100000  20335.2  11190.3  20658.6  10806.7
0.2         10      0.1      0.3      0.2      0.2
0.2       1000     48.0     19.6     56.7     22.2
0.2     100000  16713.0   5533.4  16885.0   5930.6

Discussion

My initial guesses were in general correct. For presorted data (2) and (4) are times better when we have 100,000 elements. The difference becomes even bigger when we have many duplicates, as they don't increase sorting time and duplicate insertion to HashSet is much more efficient than duplicate insertion to TreeSet. For random data the difference is less striking as TreeSet performance much less depend on the input data order than TimSort algorithm (which is used by Java to sort lists and arrays). For small data sets simple TreeSet is fast, but using (4) version could be competitive as well.

A source code of benchmark along with raw results is available here.

like image 138
Tagir Valeev Avatar answered Oct 19 '22 04:10

Tagir Valeev


It is hard to give a good answer without analyzing the input. Anyways I'll share my results:

I made Foo a container for a single long, and MyObj a container for single Foo. Also I made all tests end with copying data into a plain array. Also I've added two approaches:

4). Simple array

@Benchmark
public void simpleArray(Blackhole bh) {
    MyObj[] ret = new MyObj[foos.size()];
    for (int i=0;i<foos.size();i++)
        ret[i] = new MyObj(foos.get(i));
    Arrays.sort(ret);
    int lastDistinct = 0;
    for (int i = 1; i < ret.length; i++) {
        if (ret[i].equals(ret[lastDistinct])) {
            continue;
        }
        lastDistinct++;
        ret[lastDistinct] = ret[i];
    }
    MyObj[] ret2 = new MyObj[lastDistinct + 1];
    System.arraycopy(ret, 0, ret2, 0, lastDistinct + 1);
    bh.consume(ret2);
}

5). Reversed order of distinct and order of (2):

@Benchmark
public void simpleStream_distinctAfterSort(Blackhole bh) {
    List<MyObj> ret = foos.stream().map(MyObj::new)
            .sorted().distinct()
            .collect(Collectors.toList());
    bh.consume(ret.toArray(new MyObj[ret.size()]));
}

Tests setup:

public static final int MAX_SIZE = 10_000;
public static final long ELEM_THRESHOLD = MAX_SIZE * 10;
private List<Foo> foos;

@Setup
public void init() throws IOException, IllegalAccessException, InstantiationException {
    foos = new ArrayList<>(MAX_SIZE);
    for (int i = 0; i < MAX_SIZE; i++) {
        foos.add(new Foo(ThreadLocalRandom.current().nextLong(ELEM_THRESHOLD)));
    }
}

Now results with different size and threshold:

Size=10_000
Threshold=Size*10
Benchmark                                         Mode  Cnt    Score   Error  Units
StreamBenchmark5.enhancedLoop_TreeSet            thrpt    2  478,978          ops/s
StreamBenchmark5.simpleArray                     thrpt    2  591,287          ops/s
StreamBenchmark5.simpleStream                    thrpt    2  407,556          ops/s
StreamBenchmark5.simpleStream_distinctAfterSort  thrpt    2  533,091          ops/s
StreamBenchmark5.treeSetStream                   thrpt    2  492,709          ops/s

Size=10_000
Threshold=Size/10
StreamBenchmark5.enhancedLoop_TreeSet            thrpt    2   937,908          ops/s
StreamBenchmark5.simpleArray                     thrpt    2   593,983          ops/s
StreamBenchmark5.simpleStream                    thrpt    2  3344,508          ops/s
StreamBenchmark5.simpleStream_distinctAfterSort  thrpt    2   560,652          ops/s
StreamBenchmark5.treeSetStream                   thrpt    2  1000,585          ops/s

Size=500_000
Threshold=Size*10
Benchmark                                         Mode  Cnt  Score   Error  Units
StreamBenchmark5.enhancedLoop_TreeSet            thrpt    2  1,809          ops/s
StreamBenchmark5.simpleArray                     thrpt    2  4,009          ops/s
StreamBenchmark5.simpleStream                    thrpt    2  2,443          ops/s
StreamBenchmark5.simpleStream_distinctAfterSort  thrpt    2  4,141          ops/s
StreamBenchmark5.treeSetStream                   thrpt    2  2,040          ops/s

Size=500_000
Threshold=Size/10
Benchmark                                         Mode  Cnt   Score   Error  Units
StreamBenchmark5.enhancedLoop_TreeSet            thrpt    2   5,724          ops/s
StreamBenchmark5.simpleArray                     thrpt    2   4,567          ops/s
StreamBenchmark5.simpleStream                    thrpt    2  19,001          ops/s
StreamBenchmark5.simpleStream_distinctAfterSort  thrpt    2   4,840          ops/s
StreamBenchmark5.treeSetStream                   thrpt    2   5,407          ops/s

Size=1_000_000
Threshold=Size/100
Benchmark                                         Mode  Cnt   Score   Error  Units
StreamBenchmark5.enhancedLoop_TreeSet            thrpt    2   4,529          ops/s
StreamBenchmark5.simpleArray                     thrpt    2   2,402          ops/s
StreamBenchmark5.simpleStream                    thrpt    2  35,699          ops/s
StreamBenchmark5.simpleStream_distinctAfterSort  thrpt    2   2,232          ops/s
StreamBenchmark5.treeSetStream                   thrpt    2   4,889          ops/s

As you can see depending on the amount of duplicates preferable algorithm changes. The most balanced approach is TreeSet (3), however the fastest one is almost always the simple stream (with order and distinct located corresponding to input data).

Here's the test source if you are willing to play a bit. You'll need JMH.

like image 26
user3707125 Avatar answered Oct 19 '22 03:10

user3707125