Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficient data structure to find intersection of two lists

I have two very large List<List<int>> A and B. I need to find intersection in between each element of those lists.

A[0] = { 1, 2, 3};
B[0] = {2, 3, 4};

Intersection = { 2, 3 };

My implementation:

List<int> intersection = A[0].Intersection(B[0]).ToList();

This solution takes very long time to execute. I am wondering if there are any better way to do this and any more efficient data structure I can use to perform it in better time.

Thanks!

like image 911
John Latham Avatar asked Feb 18 '23 13:02

John Latham


2 Answers

You should use a Hashset for this, in C# HashSet<T>. Lookups in hashsets are O(1) (if decent hashing function and using an array underneath) as opposed to O(n) for lists.

Using Linq in C# you basically get this "built-in": Intersect() will use a hashset internally to compute the intersection in O(n) instead of O(n^2) if using two lists.

var intersection = a.Intersect(b).ToList();
like image 74
BrokenGlass Avatar answered Mar 16 '23 00:03

BrokenGlass


Code sample using HashSet(T).IntersectWith:

HashSet<string> lst1 = new HashSet<string> 

     { "id1", "id2", "id3" };

HashSet<string> lst2 = new HashSet<string> 

     { "id2", "id3", "id4" };

// what happens is that, lst1 will be modified by only leaving the intersect items
lst1.IntersectWith(lst2);

PS: I used the sample for String, but you can utilize your own integer values.

like image 31
aspiring Avatar answered Mar 15 '23 23:03

aspiring