Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fast intersection of HashSet<int> and List<int>

I have a HashSet<int> and a List<int> (Hashset has approximately 3 Million items, List has approximately 300k items).

I currently intersect them using

var intersected = hashset.Intersect(list).ToArray();

and I wonder if there is any faster way to do so. Maybe in parallel?

like image 573
user2033412 Avatar asked May 01 '20 17:05

user2033412


People also ask

What is the use of intersectwith method in Java HashSet?

HashSet.IntersectWith (IEnumerable) Method is used to modify the current HashSet object to contain only elements that are present in that object and in the specified collection.

How do I intersect a HashSet in IEnumerable?

private static IEnumerable<int> Intersect (HashSet<int> hash, List<int> list) { HashSet<int> intersect = new HashSet<int> (list); intersect.IntersectWith (hash); return intersect; }

How to convert list to HashSet in Java?

Convert List to HashSet: 1. Using HashSet Constructor We can make use of HashSet constructorwhich takes an IEnumerable<T>to construct a new instance of the HashSet<T>class containing distinct elements of the list. Note that since HashSetcontains distinct elements, any repeated elements in the list will be discarded.

What is the initialcapacity of a HashSet in Java?

HashSet (int initialCapacity): This constructor is used to build an empty HashSet object in which the initialCapacity is specified at the time of object creation. Here, the default loadFactor remains 0.75.


2 Answers

HashSet has a method IntersectWith that is optimized if intersection is performed between two hash sets. Using method IntersectWith we can intersect HashSet and List using the next approach:

private static IEnumerable<int> Intersect(HashSet<int> hash, List<int> list)
{
    HashSet<int> intersect = new HashSet<int>(list);
    intersect.IntersectWith(hash);
    return intersect;
}

I have measured (using Stopwatch) performance of your original method (Linq Intersect), methods proposed by @TheodorZoulias (HashSet Contains and HashSet Contains Parallel) and my method (HashSet IntersectWith). Here are results:

------------------------------------------------------------------------
|         Method            | Min, ms | Max, ms | Avg, ms | StdDev, ms |
------------------------------------------------------------------------
| Linq Intersect            |   135   |   274   |   150   |     17     |
| HashSet Contains          |    25   |    44   |    26   |      2     |
| HashSet Contains Parallel |    12   |    53   |    13   |      3     |
| HashSet IntersectWith     |    57   |    89   |    61   |      4     |
------------------------------------------------------------------------

From the table we can see that the fastest method is HashSet Contains Parallel and the slowest is Linq Intersect.


Here is complete source code that was used to measure performance.

like image 92
Iliar Turdushev Avatar answered Oct 27 '22 01:10

Iliar Turdushev


Yes, you can go faster because you have already a HashSet in hand. The LINQ Intersect uses a generic algorithm, that essentially recreates a HashSet from scratch every time it's called. Here is a faster algorithm:

/// <summary>Yields all the elements of first (including duplicates) that also
/// appear in second, in the order in which they appear in first.</summary>
public static IEnumerable<TSource> Intersect<TSource>(IEnumerable<TSource> first,
    HashSet<TSource> second)
{
    foreach (TSource element in first)
    {
        if (second.Contains(element)) yield return element;
    }
}

Update: Here is a parallel version of the above idea:

var intersected = list.AsParallel().Where(x => hashset.Contains(x)).ToArray();

I wouldn't expect it to be much faster, if at all, because the workload is too granular. The overhead of calling a lambda 300,000 times will probably overshadow any benefits of the parallelism.

Also the order of the results will not be preserved, unless the AsOrdered PLINQ method is added in the query, hurting further the performance of the operation.

like image 38
Theodor Zoulias Avatar answered Oct 27 '22 01:10

Theodor Zoulias