Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the best way to aggregate Streams into one DISTINCT with Java 8

Suppose i have multiple java 8 streams that each stream potentially can be converted into Set<AppStory> , now I want with the best performance to aggregate all streams into one DISTINCT stream by ID , sorted by property ("lastUpdate")

There are several ways to do what but i want the fastest one , for example:

Set<AppStory> appStr1 =StreamSupport.stream(splititerato1, true).
map(storyId1 -> vertexToStory1(storyId1).collect(toSet());

Set<AppStory> appStr2 =StreamSupport.stream(splititerato2, true).
map(storyId2 -> vertexToStory2(storyId1).collect(toSet());

Set<AppStory> appStr3 =StreamSupport.stream(splititerato3, true).
map(storyId3 -> vertexToStory3(storyId3).collect(toSet());


Set<AppStory> set = new HashSet<>();
set.addAll(appStr1)
set.addAll(appStr2)
set.addAll(appStr3) , and than make sort by "lastUpdate"..

//POJO Object:
public class AppStory implements Comparable<AppStory> {
private String storyId;
private String ........... many other attributes......
public String getStoryId() {
    return storyId;
}
@Override
public int compareTo(AppStory o) {
    return this.getStoryId().compareTo(o.getStoryId());
   }
}

... but it is the old way.

How can I create ONE DISTINCT by ID sorted stream with BEST PERFORMANCE

somethink like :

  Set<AppStory> finalSet = distinctStream.sort((v1, v2) -> Integer.compare('not my issue').collect(toSet())

Any Ideas ?

BR

Vitaly

like image 225
VitalyT Avatar asked May 15 '16 08:05

VitalyT


People also ask

What are the aggregate operations in Java 8 streams?

Aggregate operations − Stream supports aggregate operations like filter, map, limit, reduce, find, match, and so on. Pipelining − Most of the stream operations return stream itself so that their result can be pipelined.

How do I combine two streams in Java?

concat() in Java. Stream. concat() method creates a concatenated stream in which the elements are all the elements of the first stream followed by all the elements of the second stream. The resulting stream is ordered if both of the input streams are ordered, and parallel if either of the input streams is parallel.

What would be a good way of describing streams in Java 8?

Introduced in Java 8, the Stream API is used to process collections of objects. A stream is a sequence of objects that supports various methods which can be pipelined to produce the desired result. A stream is not a data structure instead it takes input from the Collections, Arrays or I/O channels.

How do I use distinct in stream?

distinct() performs stateful intermediate operation i.e, it maintains some state internally to accomplish the operation. Syntax : Stream<T> distinct() Where, Stream is an interface and the function returns a stream consisting of the distinct elements.


1 Answers

I think the parallel overhead is much greater than the actual work as you stated in the comments. So let your Streams do the job in sequential manner.

FYI: You should prefer using Stream::concat because slicing operations like Stream::limit can be bypassed by Stream::flatMap.

Stream::sorted is collecting every element in the Stream into a List, sort the List and then pushing the elements in the desired order down the pipeline. Then the elements are collected again. So this can be avoided by collecting the elements into a List and do the sorting afterwards. Using a List is a far better choice than using a Set because the order matters (I know there is a LinkedHashSet but you can't sort it).

This is the in my opinion the cleanest and maybe the fastest solution since we cannot prove it.

Stream<AppStory> appStr1 =StreamSupport.stream(splititerato1, false)
                                       .map(this::vertexToStory1);
Stream<AppStory> appStr2 =StreamSupport.stream(splititerato2, false)
                                       .map(this::vertexToStory2);
Stream<AppStory> appStr3 =StreamSupport.stream(splititerato3, false)
                                       .map(this::vertexToStory3);

List<AppStory> stories = Stream.concat(Stream.concat(appStr1, appStr2), appStr3)
                               .distinct().collect(Collectors.toList());
// assuming AppStory::getLastUpdateTime is of type `long`
stories.sort(Comparator.comparingLong(AppStory::getLastUpdateTime));
like image 183
Flown Avatar answered Oct 05 '22 15:10

Flown