Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java - Intersection of multiple collections using stream + lambdas

I have the following function for the unification of multiple collections (includes repeated elements):

public static <T> List<T> unify(Collection<T>... collections) {
        return Arrays.stream(collections)
               .flatMap(Collection::stream)
               .collect(Collectors.toList()); 
}

It would be nice to have a function with a similar signature for the intersection of collections (using type equality). For example:

public static <T> List<T> intersect(Collection<T>... collections) {
     //Here is where the magic happens
}

I found an implementation of the intersect function, but it doesnt use streams:

public static <T> Set<T> intersect(Collection<? extends Collection<T>> collections) {
    Set<T> common = new LinkedHashSet<T>();
    if (!collections.isEmpty()) {
       Iterator<? extends Collection<T>> iterator = collections.iterator();
       common.addAll(iterator.next());
       while (iterator.hasNext()) {
          common.retainAll(iterator.next());
       }
    }
    return common;
}

Is there any way to implement something similar to the unify function making use of streams? Im not so experienced in java8/stream api, because of that some advice would be really helpful.

like image 380
Jota.Toledo Avatar asked Feb 13 '17 22:02

Jota.Toledo


People also ask

How to combine collections using Java stream interface?

The Stream interface in Java API provides useful methods that make it easier to process collections. Let's take a look at two of its methods – concat () and flatMap () – that are used for combining collections. Once you obtain a Stream, you can perform aggregate operations on it. 2.1. Using the concat () Method

How to sort different collections using lambda expression with collections?

In this article Lambda Expression with Collections are discussed with examples of sorting different collections like ArrayList, TreeSet, TreeMap, etc. We can use Comparator interface to sort, It only contain one abstract method: – compare (). An interface that only contain only a single abstract method then it is called as Functional Interface.

How to combine collections in Java 8?

Using Java 8 Stream API The Stream interface in Java API provides useful methods that make it easier to process collections. Let's take a look at two of its methods – concat () and flatMap () – that are used for combining collections. Once you obtain a Stream, you can perform aggregate operations on it. 2.1. Using the concat () Method

What is the use of stream interface in Java?

The Stream interface in Java API provides useful methods that make it easier to process collections. Let's take a look at two of its methods – concat () and flatMap () – that are used for combining collections. Once you obtain a Stream, you can perform aggregate operations on it. 2.1.


3 Answers

You can write your own collector in some utility class and use it:

public static <T, S extends Collection<T>> Collector<S, ?, Set<T>> intersecting() {
    class Acc {
        Set<T> result;

        void accept(S s) {
            if(result == null) result = new HashSet<>(s);
            else result.retainAll(s);
        }

        Acc combine(Acc other) {
            if(result == null) return other;
            if(other.result != null) result.retainAll(other.result);
            return this;
        }
    }
    return Collector.of(Acc::new, Acc::accept, Acc::combine, 
                        acc -> acc.result == null ? Collections.emptySet() : acc.result, 
                        Collector.Characteristics.UNORDERED);
}

The usage would be pretty simple:

Set<T> result = Arrays.stream(collections).collect(MyCollectors.intersecting());

Note however that collector cannot short-circuit: even if intermediate result will be an empty collection, it will still process the rest of the stream.

Such collector is readily available in my free StreamEx library (see MoreCollectors.intersecting()). It works with normal streams like above, but if you use it with StreamEx (which extends normal stream) it becomes short-circuiting: the processing may actually stop early.

like image 58
Tagir Valeev Avatar answered Oct 09 '22 19:10

Tagir Valeev


While it’s tempting to think of retainAll as a black-box bulk operation that must be the most efficient way to implement an intersection operation, it just implies iterating over the entire collection and testing for each element whether it is contained in the collection passed as argument. The fact that you are calling it on a Set does not imply any advantage, as it is the other collection, whose contains method will determine the overall performance.

This implies that linearly scanning a set and testing each element for containment within all other collections will be on par with performing retainAll for each collection. Bonus points for iterating over the smallest collection in the first place:

public static <T> Set<T> intersect(Collection<? extends Collection<T>> collections) {
    if(collections.isEmpty()) return Collections.emptySet();
    Collection<T> smallest
        = Collections.min(collections, Comparator.comparingInt(Collection::size));
    return smallest.stream().distinct()
        .filter(t -> collections.stream().allMatch(c -> c==smallest || c.contains(t)))
        .collect(Collectors.toSet());
}

or, alternatively

public static <T> Set<T> intersect(Collection<? extends Collection<T>> collections) {
    if(collections.isEmpty()) return Collections.emptySet();
    Collection<T> smallest
        = Collections.min(collections, Comparator.comparingInt(Collection::size));
    HashSet<T> result=new HashSet<>(smallest);
    result.removeIf(t -> collections.stream().anyMatch(c -> c!=smallest&& !c.contains(t)));
    return result;
}
like image 41
Holger Avatar answered Oct 09 '22 20:10

Holger


I think maybe it would make more sense to use Set instead of List (maybe that was a typo in your question):

public static <T> Set<T> intersect(Collection<T>... collections) {
     //Here is where the magic happens
     return (Set<T>) Arrays.stream(collections).reduce(
             (a,b) -> {
                 Set<T> c = new HashSet<>(a);
                 c.retainAll(b);
                 return c;
             }).orElseGet(HashSet::new);
}
like image 32
Patrick Parker Avatar answered Oct 09 '22 19:10

Patrick Parker