Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fast count() intersection of two string arrays

I need to count the number of elements corresponding to the intersection of two big arrays of strings and do it very fast.

I am using the following code:

arr1[i].Intersect(arr2[j]).Count()

For CPU Time, VS Profiler indicates

  • 85.1% in System.Linq.Enumerable.Count()
  • 0.3% in System.Linq.Enumerable.Intersect()

Unfortunately it might to take hours to do all work.

How to do it faster?

like image 298
LeMoussel Avatar asked Dec 16 '12 10:12

LeMoussel


1 Answers

You can use HashSet with arr2

HashSet<string> arr2Set = new HashSet<string>(arr2);
arr1.Where(x=>arr2Set.Contains(x)).Count();
              ------------------
                      |
                      |->HashSet's contains method executes quickly using hash-based lookup..

Not considering the conversion from arr2 to arr2Set ,this should be O(n)

like image 98
Anirudha Avatar answered Oct 29 '22 11:10

Anirudha