Suppose i have multiple java 8 streams that each stream potentially can be converted into Set<AppStory>
, now I want with the best performance to aggregate all streams into one DISTINCT stream by ID , sorted by property ("lastUpdate")
There are several ways to do what but i want the fastest one , for example:
Set<AppStory> appStr1 =StreamSupport.stream(splititerato1, true).
map(storyId1 -> vertexToStory1(storyId1).collect(toSet());
Set<AppStory> appStr2 =StreamSupport.stream(splititerato2, true).
map(storyId2 -> vertexToStory2(storyId1).collect(toSet());
Set<AppStory> appStr3 =StreamSupport.stream(splititerato3, true).
map(storyId3 -> vertexToStory3(storyId3).collect(toSet());
Set<AppStory> set = new HashSet<>();
set.addAll(appStr1)
set.addAll(appStr2)
set.addAll(appStr3) , and than make sort by "lastUpdate"..
//POJO Object:
public class AppStory implements Comparable<AppStory> {
private String storyId;
private String ........... many other attributes......
public String getStoryId() {
return storyId;
}
@Override
public int compareTo(AppStory o) {
return this.getStoryId().compareTo(o.getStoryId());
}
}
... but it is the old way.
How can I create ONE DISTINCT by ID sorted stream with BEST PERFORMANCE
somethink like :
Set<AppStory> finalSet = distinctStream.sort((v1, v2) -> Integer.compare('not my issue').collect(toSet())
Any Ideas ?
BR
Vitaly
Aggregate operations − Stream supports aggregate operations like filter, map, limit, reduce, find, match, and so on. Pipelining − Most of the stream operations return stream itself so that their result can be pipelined.
concat() in Java. Stream. concat() method creates a concatenated stream in which the elements are all the elements of the first stream followed by all the elements of the second stream. The resulting stream is ordered if both of the input streams are ordered, and parallel if either of the input streams is parallel.
Introduced in Java 8, the Stream API is used to process collections of objects. A stream is a sequence of objects that supports various methods which can be pipelined to produce the desired result. A stream is not a data structure instead it takes input from the Collections, Arrays or I/O channels.
distinct() performs stateful intermediate operation i.e, it maintains some state internally to accomplish the operation. Syntax : Stream<T> distinct() Where, Stream is an interface and the function returns a stream consisting of the distinct elements.
I think the parallel overhead is much greater than the actual work as you stated in the comments. So let your Stream
s do the job in sequential manner.
FYI: You should prefer using Stream::concat
because slicing operations like Stream::limit
can be bypassed by Stream::flatMap
.
Stream::sorted
is collecting every element in the Stream
into a List
, sort the List
and then pushing the elements in the desired order down the pipeline. Then the elements are collected again. So this can be avoided by collecting the elements into a List
and do the sorting afterwards. Using a List
is a far better choice than using a Set
because the order matters (I know there is a LinkedHashSet
but you can't sort it).
This is the in my opinion the cleanest and maybe the fastest solution since we cannot prove it.
Stream<AppStory> appStr1 =StreamSupport.stream(splititerato1, false)
.map(this::vertexToStory1);
Stream<AppStory> appStr2 =StreamSupport.stream(splititerato2, false)
.map(this::vertexToStory2);
Stream<AppStory> appStr3 =StreamSupport.stream(splititerato3, false)
.map(this::vertexToStory3);
List<AppStory> stories = Stream.concat(Stream.concat(appStr1, appStr2), appStr3)
.distinct().collect(Collectors.toList());
// assuming AppStory::getLastUpdateTime is of type `long`
stories.sort(Comparator.comparingLong(AppStory::getLastUpdateTime));
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With