Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can I check if Java 8 stream contains at least n elements

I have a Java 8 stream from which I want to (uniformly) randomly select an element. The stream can contain anywhere from zero to tens of thousands of elements.

I have implemented an algorithm that selects one using a MapReduce-like pattern, but for the very small streams it would probably be more efficient to just collect the items into a List and return one with a random index. For that I have to count them, however. Streams do have a count() method but that counts them all, I'm not really interested in the actual count, all I care about is whether it contains more than a to-be-determined number. Does anyone know if such a method exists? I can't find it but there might be something I'm overlooking or some clever trick for finding it anyway.

P.S.: I'm aware that sometimes it's not necessary to optimize code; but I would like to try it nonetheless, just for the experience. I'm a student.

P.P.S.: I've copied my algorithm here, in case anyone's interested (or wants to look for bugs, I haven't tested it yet ;-)

stream
    .parallel()
    .map(t -> new Pair<T, Integer>(t, 1))
    .reduce((Pair<T, Integer> t, Pair<T, Integer> u) -> {
        if (rand.nextDouble() <= (t.getValue1() / (double) (t.getValue1() + u.getValue1()))) {
            return new Pair<>(t.getValue0(), t.getValue1() + u.getValue1());
        } else {
            return new Pair<>(u.getValue0(), t.getValue1() + u.getValue1());
        }
    })
    .map(t -> t.getValue0());

(The pairs are from org.javatuples, now that Java supports functional programming-like interfaces the lack of tuples does become a bit painful).

like image 759
Pieter-Paul Kramer Avatar asked Jul 24 '15 10:07

Pieter-Paul Kramer


1 Answers

Your code does not return element from uniform distribution. It depends on the order, that stream provides elements to reduce method. In general case you can't consider that the order will not be the special one. Solving your task: if you have enough memory, it is possible to write RandomComparator (that saves previous results in Map), sort your stream with this comparator and get first element (don't use findAny). If stream is too large, it is possible to sample it with RandomFilter.

btw, if you have SIZED flag in your stream, task is trivial. Just get size, generate random index and make spip :)

like image 129
user2717921 Avatar answered Sep 21 '22 22:09

user2717921