I am looking for a memory-efficient way in Java to find top n elements from a huge collection. For instance, I have a word, a distance() method, and a collection of "all" words. I have implemented a class Pair that implements compareTo() so that pairs are sorted by their values.
Using streams, my naive solution looks like this:
double distance(String word1, String word2){
...
}
Collection<String> words = ...;
String word = "...";
words.stream()
.map(w -> new Pair<String, Double>(w, distance(word, w)))
.sorted()
.limit(n);
To my understanding, this will process and intermediately store each element in words so that it can be sorted before applying limit(). However, it is more memory-efficient to have a collection that stores n elements and whenever a new element is added, it removes the smallest element (according to the comparable object's natural order) and thus never grows larger than n (or n+1).
This is exactly what the Guava MinMaxPriorityQueue does. Thus, my current best solution to the above problem is this:
Queue<Pair<String, Double>> neighbours = MinMaxPriorityQueue.maximumSize(n).create();
words.stream()
.forEach(w -> neighbours.add(new Pair<String, Double>(w, distance(word, w)));
The sorting of the top n elements remains to be done after converting the queue to a stream or list, but this is not an issue since n is relatively small.
My question is: is there a way to do the same using streams?
A heap-based structure will of course be more efficient than sorting the entire huge list. Luckily, streams library is perfectly happy to let you use specialized collections when necessary:
MinMaxPriorityQueue<Pair<String, Double>> topN = words.stream()
.map(w -> new Pair<String, Double>(w, distance(word, w)))
.collect(toCollection(
() -> MinMaxPriorityQueue.maximumSize(n).create()
));
This is better than the .forEach
solution because it's easy to parallelize and is more idiomatic java8.
Note that () -> MinMaxPriorityQueue.maximumSize(n).create()
should be possible to be replaced with MinMaxPriorityQueue.maximumSize(n)::create
but, for some reason, that won't compile under some conditions (see comments below).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With