Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PartitioningBy with limit

I need to split the list into two lists by predicate with limiting elements that are going to true part.
E.g. Let's say I have such list : A = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] and I want to split it by predicate o -> o % 2 == 0 and with limit 3.
I want to get Map<Boolean, List<Integer>> where:

true -> [2, 4, 6] // objects by predicate and with limit (actually, order is not important)
false -> [1, 3, 5, 7, 8, 9, 10]  // All other objects

Java 8 has collector that splits stream by predicate - Collectors.partitioningBy(...), but it doesn't support limits. Is it possible to do this with java 8 streams / guava / apache, or should I create my own implementation of this function?

EDIT: I wrote this function. If you have any suggestion about this, feel free to tell me. MultiValuedMap is optional and can be replaced with Map.

private <E> MultiValuedMap<Boolean, E> partitioningByWithLimit(Predicate<E> predicate, List<E> src, int limit) {
    MultiValuedMap<Boolean, E> result = new ArrayListValuedHashMap<>();
    Iterator<E> iterator = src.iterator();
    while (iterator.hasNext()) {
        E next = iterator.next();
        if (limit > 0 && predicate.test(next)) {
            result.put(true, next);
            iterator.remove();
            limit--;
        }
    }
    result.putAll(false, src);
    return result;
}
like image 827
Feedforward Avatar asked May 17 '17 12:05

Feedforward


2 Answers

Here is a way to do it based on a custom collector:

public static <E> Collector<E, ?, Map<Boolean, List<E>>> partitioningByWithLimit(
        Predicate<E> predicate,
        int limit) {

    class Acc {
        Map<Boolean, List<E>> map = new HashMap<>();

        Acc() {
            map.put(true, new ArrayList<>());
            map.put(false, new ArrayList<>());
        }

        void add(E elem) {
            int size = map.get(true).size();
            boolean key = size < limit && predicate.test(elem);
            map.get(key).add(elem);
        }

        Acc combine(Acc another) {
            another.map.get(true).forEach(this::add);
            another.map.get(false).forEach(this::add);
            return this;
        }
    }

    return Collector.of(Acc::new, Acc::add, Acc::combine, acc -> acc.map));
}

I'm using a local Acc class that wraps the map and exposes the logic to accumulate and combine elements of the stream into a map. This map is partitioned according to the predicate and limit provided.

At the end, I'm collecting the stream with Collector.of.

Usage:

List<Integer> list = Arrays.asList(1, 2, 3, 4, 5, 6, 7, 8, 9, 10);

Map<Boolean, List<Integer>> map = list.stream()
        .collect(partitioningByWithLimit(n -> n % 2 == 0, 3));

Output is:

{false=[1, 3, 5, 7, 8, 9, 10], true=[2, 4, 6]}

The main advantage of this approach is that it also supports parallel streams.

like image 110
fps Avatar answered Oct 22 '22 12:10

fps


There is no clean Stream solution, as the task relies on a stateful predicate.

So your loop is not bad, but it can be cleanup up a bit:

private <E> MultiValuedMap<Boolean, E> partitioningByWithLimit(
                                       Predicate<E> predicate, List<E> src, int limit) {
    MultiValuedMap<Boolean, E> result = new ArrayListValuedHashMap<>();
    for(E next: src) {
        boolean key = limit>0 && predicate.test(next);
        result.put(key, next);
        if(key) limit--;
    }
    return result;
}

If you really want to get the feeling of being a little faster when the limit has been reached, you may use

private <E> MultiValuedMap<Boolean, E> partitioningByWithLimit(
                                       Predicate<E> predicate, List<E> src, int limit) {
    MultiValuedMap<Boolean, E> result = new ArrayListValuedHashMap<>();
    for(Iterator<E> iterator = src.iterator(); iterator.hasNext(); ) {
        E next = iterator.next();
        boolean key = predicate.test(next);
        result.put(key, next);
        if(key && --limit==0) iterator.forEachRemaining(result.get(false)::add);
    }
    return result;
}

This avoids rechecking the limit and even the map lookup for the remaining elements, however, I wouldn’t expect a big performance difference. The first variant is much simpler.

Another alternative, utilizing more Java 8 features, is

private <E> MultiValuedMap<Boolean, E> partitioningByWithLimit(
                                       Predicate<E> predicate, List<E> src, int limit) {
    MultiValuedMap<Boolean, E> result = new ArrayListValuedHashMap<>();
    result.putAll(false, src);
    List<E> pos = result.get(true);
    result.get(false).removeIf(e -> pos.size()<limit && predicate.test(e) && pos.add(e));
    return result;
}
like image 32
Holger Avatar answered Oct 22 '22 13:10

Holger