I'm stuck on an edge case in java stream manipulations...
I want to code the following behavior: "From an arbitrary basket of fruits, collect the 20 smallest, except the smallest pear, because we don't want that."
Added bonus: the baskets to come might not have any pear at all.
Examples :
So far, I'm at this step:
output = basket.stream()
.sorted(Comparator.comparing(Fruit::getSize))
//.filter(???)
.limit(20)
.collect(fruitCollector);
This seems like a case of stateful lambda filter, and I don't know how to do that.
I can't use a local firstPear
boolean and set it to true
after filtering the first pear, since all local variables in a lambda must be final.
Worst case scenario I can split the basket in two, pears and non-pears, sort the pears, and sublist them appropriately if there is any. This seems very inefficient and ugly. Is there a better way?
There was much variety in the answers posted here, and most of them are valid. In order to give back to the community, I put together a small testing harness to compare the performance of these algorithms.
This comparison was not as extensive as I wanted - It's been 3 weeks already. It only covers usage for sequential processing of simple items. Feel free to give the testing harness a go, and add more tests, more benchmarks, or your own implementation.
My analysis:
Algorithm | Author | Perf | Comments -------------------------------------------------------------------------------- Indexed removal | Holger | Best | Best overall, somewhat obscure Stateful predicate | pedromss | Best | Do not use for parallel processing Straightforward approach | Misha | Best | Better when few elements match Custom collector | Eugene | Good | Better when all or no element match Comaprator hack w/ dummy | yegodm | Good | - Comparator hack | xenteros | * | Perf sensitive to output size, fails on edge cases.
I acecpted pedromss' answer, as it's the one we implemented in the project, due to both its good performance, and "black-box" capabilities (the state-managing code is in an external class, and contributors can focus on the business logic).
Note that the accepted answer might not be the best for you: review the others, or check my testing project to see for yourself.
Have you considered a straightforward approach? Find the smallest pear, filter it out (if it exists) and collect 20 smallest:
Optional<Fruit> smallestPear = basket.stream()
.filter(Fruit::isPear) // or whatever it takes to test if it's a pear
.min(Fruit::getSize);
Stream<Fruit> withoutSmallestPear = smallestPear
.map(p -> basket.stream().filter(f -> f != p))
.orElseGet(basket::stream);
List<Fruit> result = withoutSmallestPear
.sorted(comparing(Fruit::getSize))
.limit(20)
.collect(toList());
As far as I can tell this has custom written all over it, so I did try a custom collector here:
private static <T> Collector<T, ?, List<T>> exceptCollector(Predicate<T> predicate, int size, Comparator<T> comparator) {
class Acc {
private TreeSet<T> matches = new TreeSet<>(comparator);
private TreeSet<T> doesNot = new TreeSet<>(comparator);
void accumulate(T t) {
if (predicate.test(t)) {
matches.add(t);
} else {
doesNot.add(t);
}
}
Acc combine(Acc other) {
matches.addAll(other.matches);
doesNot.addAll(other.doesNot);
return this;
}
List<T> finisher() {
T smallest = matches.first();
if (smallest != null) {
matches.remove(smallest);
}
matches.addAll(doesNot);
return matches.stream().limit(size).collect(Collectors.toList());
}
}
return Collector.of(Acc::new, Acc::accumulate, Acc::combine, Acc::finisher);
}
And usage would be:
List<Fruit> fruits = basket.getFruits()
.stream()
.collect(exceptCollector(Fruit::isPear, 20, Comparator.comparing(Fruit::getSize)));
For easier implementation, I attach an example for:
class Fruit {
String name;
Long size;
}
The following will work:
Comparator<Fruit> fruitComparator = (o1, o2) -> {
if (o1.getName().equals("Peach") && o2.getName().equals("Peach")) {
return o2.getSize().compareTo(o1.getSize()); //reverse order of Peaches
}
if (o1.getName().equals("Peach")) {
return 1;
}
if (o2.getName().equals("Peach")) {
return -1;
}
return o1.getSize().compareTo(o2.getSize());
};
And:
output = basket.stream()
.sorted(Comparator.comparing(Fruit::getSize))
.limit(21)
.sorted(fruitComparator)
.limit(20)
.sorted(Comparator.comparing(Fruit::getSize))
.collect(fruitCollector);
My comparator will put the smallest Peach to the 21st position, will keep the order of other Fruit
s natural, so in case there is not Peach
, it will return 21st largest element. Then I sort the rest in the normal order.
This will work. It's a hack and at some circumstances might be a bad choice. I'd like to point out, that sorting 20 elements shouldn't be an issue.
You can use a stateful predicate:
class StatefulPredicate<T> implements Predicate<T> {
private boolean alreadyFiltered;
private Predicate<T> pred;
public StatefulPredicate(Predicate<T> pred) {
this.pred = pred;
this.alreadyFiltered = false;
}
@Override
public boolean test(T t) {
if(alreadyFiltered) {
return true;
}
boolean result = pred.test(t);
alreadyFiltered = !result;
return result;
}
}
Stream.of(1, -1, 3, -4, -5, 6)
.filter(new StatefulPredicate<>(i -> i > 0))
.forEach(System.out::println);
Prints: 1, 3, -4, -5, 6
If concurrency is an issue you can use an atomic boolean.
If you wish to skip more than 1 element, add that parameter to your constructor and build your logic inside StatefulPredicate
This predicate filters the first negative element and then lets every other element pass, regardless. In your case you should test for instanceof Pear
Since people showed concerns about filter being stateless, from the documentation:
Intermediate operations are further divided into stateless and stateful operations. Stateless operations, such as filter and map, retain no state from previously seen element when processing a new element -- each element can be processed independently of operations on other elements. Stateful operations, such as distinct and sorted, may incorporate state from previously seen elements when processing new elements.
That predicate doesn't retain information about previously seen elements. It retains information about previous results.
Also it can be made thread safe to avoid concurrency issues.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With