Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Stop java stream computations based on previous computation results

How to break stream computation based on previous results? If it's obvious that stream.filter(...).count() would be less than some number - how to stop stream computation?

I have the following code which checks if some sampleData passes the predicate test:

// sampleData.size() may be greater than 10.000.000
Set<String> sampleData = downloadFromWeb(); 
return sampleData.stream().filter(predicate::test).count() > sampleData.size() * coefficient;

I could have thousands of sampleData. The problem is that this code is ineffective. For example, if coefficient equals 0.5, sampleData.size() = 10_000_000, and first 5_000_000 elements fails the predicate::test - there is no reason to validate last 5_000_000 elements (count() will never be greater than 5_000_000).

like image 963
VB_ Avatar asked Sep 12 '17 10:09

VB_


Video Answer


2 Answers

ZhekaKozlov’s answer is heading into the right direction, but it lacks the negation. For the matches to be larger than a certain threshold, the number of non matching elements must be smaller than “size - threshold”. If we test for the nonmatching elements to be smaller, we can apply a limit to stop once they become larger:

Set<String> sampleData = downloadFromWeb();
final long threshold = sampleData.size()-(long)(sampleData.size() * coefficient);
return sampleData.stream()
                 .filter(predicate.negate()).limit(threshold+1).count() < threshold;

There is, by the way, no reason to create a method reference to the test method of an existing Predicate like with predicate::test. Just pass the Predicate to the filter method. The code above also uses predicate.negate() instead of predicate.negate()::test

like image 180
Holger Avatar answered Oct 22 '22 12:10

Holger


To be honest I am not quite sure this would be correct, I hope someone will come along and review this, but here is my idea of using a custom spliterator:

 static class CustomSpl<T> extends AbstractSpliterator<T> {

    private Spliterator<T> source;

    private int howMany;

    private int coefficient;

    private Predicate<T> predicate;

    private T current;

    private long initialSize;

    private void setT(T t) {
        this.current = t;
    }

    public CustomSpl(Spliterator<T> source, int howMany, int coefficient, Predicate<T> predicate, long initialSize) {
        super(source.estimateSize(), source.characteristics());
        this.source = source;
        this.howMany = howMany;
        this.coefficient = coefficient;
        this.predicate = predicate;
        this.initialSize = initialSize;
    }

    @Override
    public boolean tryAdvance(Consumer<? super T> action) {
        boolean hasMore = source.tryAdvance(this::setT);

        System.out.println(current);

        if (!hasMore) {
            return false;
        }

        if (predicate.test(current)) {
            ++howMany;
        }

        if (initialSize - howMany <= coefficient) {
            return false;
        }

        action.accept(current);
        return true;
    }

}

And for example this will produce only 4 elements, since we said to only care having a coefficient 5:

Spliterator<Integer> sp = Arrays.asList(1, 2, 3, 4, 5, 6, 7, 8, 9, 10).stream().spliterator();

long count = StreamSupport.stream(new CustomSpl<>(sp, 0, 5, x -> x > 3, sp.getExactSizeIfKnown()), false)
            .count();

Also this is possible for spliterators with known size only.

like image 33
Eugene Avatar answered Oct 22 '22 11:10

Eugene