Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Intersection of two sets in most optimized way

Given two sets of values, I have to find whether there is any common element among them or not i.e. whether their intersection is null or not.

Which of the standard C# collection will suit best (in terms of performance) for this purpose? I know that linq has a Intersect extension method to find out the intersection of two list/arrays but my focus is on performance in terms of Big-O notation.

And what if I have to find out the intersection of two sets as well?

like image 538
Ravi Gupta Avatar asked Jan 25 '13 17:01

Ravi Gupta


People also ask

How do you find the intersection of two sets?

The intersection of two or more given sets is the set of elements that are common to each of the given sets. The intersection of sets is denoted by the symbol '∩'. In the case of independent events, we generally use the multiplication rule, P(A ∩ B) = P( A )P( B ).

What is meant by intersection of two sets?

The intersection of two sets is defined as the set containing elements in set A which are also present in set B; in other words, the common elements.

What is the use of intersection in sets?

Python Set intersection() Method The intersection() method returns a set that contains the similarity between two or more sets. Meaning: The returned set contains only items that exist in both sets, or in all sets if the comparison is done with more than two sets.


1 Answers

Well, if you use LINQ's Intersect method it will build up a HashSet of the second sequence, and then check each element of the first sequence against it. So it's O(M+N)... and you can use foo.Intersect(bar).Any() to get an early-out.

Of course, if you store one (either) set in a HashSet<T> to start with, you can just iterate over the other one checking for containment on each step. You'd still need to build the set to start with though.

Fundamentally you've got an O(M+N) problem whatever you do - you're not going to get cheaper than that (there's always the possibility that you'll have to look at every element) and if your hash codes are reasonable, you should be able to achieve that complexity easily. Of course, some solutions may give better constant factors than others... but that's performance rather than complexity ;)

EDIT: As noted in the comments, there's also ISet<T>.Overlaps - if you've already got either set with a static type of ISet<T> or a concrete implementation, calling Overlaps makes it clearer what you're doing. If both of your sets are statically typed as ISet<T>, use larger.Overlaps(smaller) (where larger and smaller are in terms of the size of the set) as I'd expect an implementation of Overlaps to iterate over the argument and check each element against contents of the set you call it on.

like image 85
Jon Skeet Avatar answered Sep 16 '22 18:09

Jon Skeet