Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to filter out only the first element not matching a predicate in a Java sequential Stream?

I'm stuck on an edge case in java stream manipulations...

I want to code the following behavior: "From an arbitrary basket of fruits, collect the 20 smallest, except the smallest pear, because we don't want that."

Added bonus: the baskets to come might not have any pear at all.

Examples :

  • From [Pear 5, Apple 1, Apple 2, Apple 10, Pear 3, Pear 7], we want [Apple 1, Apple 2, Pear 5, Pear 7, Apple 10].
  • From [Apple 4, Apple 7, Pear 8, Pear 2, Pear 3], we want [Pear 3, Apple 4, Apple 7, Pear 8].

So far, I'm at this step:

output = basket.stream()
    .sorted(Comparator.comparing(Fruit::getSize))
    //.filter(???)
    .limit(20)
    .collect(fruitCollector);

This seems like a case of stateful lambda filter, and I don't know how to do that.

I can't use a local firstPear boolean and set it to true after filtering the first pear, since all local variables in a lambda must be final.

Worst case scenario I can split the basket in two, pears and non-pears, sort the pears, and sublist them appropriately if there is any. This seems very inefficient and ugly. Is there a better way?


[Edit] Answer comparison

There was much variety in the answers posted here, and most of them are valid. In order to give back to the community, I put together a small testing harness to compare the performance of these algorithms.

This comparison was not as extensive as I wanted - It's been 3 weeks already. It only covers usage for sequential processing of simple items. Feel free to give the testing harness a go, and add more tests, more benchmarks, or your own implementation.

My analysis:

Algorithm                | Author   | Perf | Comments
--------------------------------------------------------------------------------
Indexed removal          | Holger   | Best | Best overall, somewhat obscure
Stateful predicate       | pedromss | Best | Do not use for parallel processing
Straightforward approach | Misha    | Best | Better when few elements match
Custom collector         | Eugene   | Good | Better when all or no element match
Comaprator hack w/ dummy | yegodm   | Good | -
Comparator hack          | xenteros | *    | Perf sensitive to output size, fails on edge cases.

I acecpted pedromss' answer, as it's the one we implemented in the project, due to both its good performance, and "black-box" capabilities (the state-managing code is in an external class, and contributors can focus on the business logic).

Note that the accepted answer might not be the best for you: review the others, or check my testing project to see for yourself.

like image 683
Silver Quettier Avatar asked Aug 28 '17 14:08

Silver Quettier


4 Answers

Have you considered a straightforward approach? Find the smallest pear, filter it out (if it exists) and collect 20 smallest:

Optional<Fruit> smallestPear = basket.stream()
        .filter(Fruit::isPear)  // or whatever it takes to test if it's a pear
        .min(Fruit::getSize);

Stream<Fruit> withoutSmallestPear = smallestPear
        .map(p -> basket.stream().filter(f -> f != p))
        .orElseGet(basket::stream);

List<Fruit> result = withoutSmallestPear
        .sorted(comparing(Fruit::getSize))
        .limit(20)
        .collect(toList());
like image 177
Misha Avatar answered Nov 02 '22 20:11

Misha


As far as I can tell this has custom written all over it, so I did try a custom collector here:

private static <T> Collector<T, ?, List<T>> exceptCollector(Predicate<T> predicate, int size, Comparator<T> comparator) {

    class Acc {

        private TreeSet<T> matches = new TreeSet<>(comparator);

        private TreeSet<T> doesNot = new TreeSet<>(comparator);

        void accumulate(T t) {
            if (predicate.test(t)) {
                matches.add(t);
            } else {
                doesNot.add(t);
            }
        }

        Acc combine(Acc other) {

            matches.addAll(other.matches);
            doesNot.addAll(other.doesNot);

            return this;
        }

        List<T> finisher() {
            T smallest = matches.first();
            if (smallest != null) {
                matches.remove(smallest);
            }

            matches.addAll(doesNot);
            return matches.stream().limit(size).collect(Collectors.toList());
        }

    }
    return Collector.of(Acc::new, Acc::accumulate, Acc::combine, Acc::finisher);
}

And usage would be:

List<Fruit> fruits = basket.getFruits()
            .stream()
            .collect(exceptCollector(Fruit::isPear, 20, Comparator.comparing(Fruit::getSize)));
like image 32
Eugene Avatar answered Nov 02 '22 18:11

Eugene


For easier implementation, I attach an example for:

class Fruit {
    String name;
    Long size;
}

The following will work:

Comparator<Fruit> fruitComparator = (o1, o2) -> {

    if (o1.getName().equals("Peach") && o2.getName().equals("Peach")) {
        return o2.getSize().compareTo(o1.getSize()); //reverse order of Peaches
    }

    if (o1.getName().equals("Peach")) {
        return 1;
    }
    if (o2.getName().equals("Peach")) {
        return -1;
    }
    return o1.getSize().compareTo(o2.getSize());
};

And:

output = basket.stream()
    .sorted(Comparator.comparing(Fruit::getSize))
    .limit(21)
    .sorted(fruitComparator)
    .limit(20)
    .sorted(Comparator.comparing(Fruit::getSize))
    .collect(fruitCollector);

My comparator will put the smallest Peach to the 21st position, will keep the order of other Fruits natural, so in case there is not Peach, it will return 21st largest element. Then I sort the rest in the normal order.

This will work. It's a hack and at some circumstances might be a bad choice. I'd like to point out, that sorting 20 elements shouldn't be an issue.

like image 5
xenteros Avatar answered Nov 02 '22 20:11

xenteros


You can use a stateful predicate:

class StatefulPredicate<T> implements Predicate<T> {

    private boolean alreadyFiltered;
    private Predicate<T> pred;

    public StatefulPredicate(Predicate<T> pred) {
        this.pred = pred;
        this.alreadyFiltered = false;
    }

    @Override
    public boolean test(T t) {
        if(alreadyFiltered) {
            return true;
        }

        boolean result = pred.test(t);
        alreadyFiltered = !result;
        return result;
    }
}

    Stream.of(1, -1, 3, -4, -5, 6)
        .filter(new StatefulPredicate<>(i -> i > 0))
        .forEach(System.out::println);

Prints: 1, 3, -4, -5, 6

If concurrency is an issue you can use an atomic boolean.

If you wish to skip more than 1 element, add that parameter to your constructor and build your logic inside StatefulPredicate

This predicate filters the first negative element and then lets every other element pass, regardless. In your case you should test for instanceof Pear

Edit

Since people showed concerns about filter being stateless, from the documentation:

Intermediate operations are further divided into stateless and stateful operations. Stateless operations, such as filter and map, retain no state from previously seen element when processing a new element -- each element can be processed independently of operations on other elements. Stateful operations, such as distinct and sorted, may incorporate state from previously seen elements when processing new elements.

That predicate doesn't retain information about previously seen elements. It retains information about previous results.

Also it can be made thread safe to avoid concurrency issues.

like image 3
pedromss Avatar answered Nov 02 '22 18:11

pedromss