I have two int type List
like List A
and List B
. I want to check how many items of List A
are there in List B
. I am able to do this, but what can be an efficient way as I am trying to avoid foreach
, as optimization is a prime target in my code.
List<int> A = new List<int>; List<int> B = new List<int>; // Some logic....item added in both lists. Then foreach(var item in A) { if (B.Contains(item)) { // Subtract number of duplicates } }
I tried using Intersect
and Any
, but that returns bool
so I'm not able to apply them completely.
Using the intersection() Function This is the easiest method to find common elements in two lists in Python. As the name suggests, the intersection() function is a built-in python function that is used to return a set that contains the elements which are common in two sets.
The most straightforward way to get the number of elements in a list is to use the Python built-in function len() . As the name function suggests, len() returns the length of the list, regardless of the types of elements in it.
Use the list. count() method of the built-in list class to get the number of occurrences of an item in the given list.
Method #3: Using set() + len() The task performed can also be perform using the set and len function. We can get a common element using set intersection and count the total common element using len function.
B.Intersect(A).Count(); //should do the job
Standard implementation B.Intersect(A).Count()
has big advantage to be short and readable, unless you have a measured performance issue you should go with it.
When performance are an issue you can introduce HashSet<int>
, it's a good compromise in resources usage and search time. However because you worry about performance we should perform some testing (I am using this free tool I wrote):
CPU: 1.8 GHz Pentium Core 2 Duo
Number of iterations: 100
Number of items in each list: 1000
A.Where(a => B.Contains(a)).Count()
: 8338 ticksA.Intersect(B).Count()
: 288 ticksB.Count - B.Except(A).Count()
: 313 ticks
Let's now introduce HashSet<int>
in our test (pick implementation from any other answer):
HashSet<int>
: 163 ticks
It performs much better. Can we do better? If input range is known (and limited) you can do much better than this using BitArray
. In this example I assume (for simplicity) only positive numbers but it's easy to be adapted.
public static int UseBitArray(int range, List<int> listA, List<int> listB) { var BitArray array = new BitArray(range); for (int i = 0; i < listA.Count; ++i) array[listA[i]] = true; int count = 0; for (int i = 0; i < listB.Count; ++i) { if (array[listB[i]]) ++count; } return count; }
How it performs?
BitArray
: 95 ticks
It takes only 58% of 2nd best method (HashSet<int>
). I don't even compare with others. Note that it uses memory heavily and for a wide range (let's say Int32.MaxValue / 2
) it uses a lot of memory (moreover its size is limited to Int32.MaxValue
then you can't have full signed 32 bit integer range. If its limitations aren't a problem for you then you should definitely go with it.
Also note that if you can do some assumptions about your inputs then you can further optimize your search function (for example if you can assume that sets are ordered).
How they scale up (Y axis scale is logarithmic):
Note that Except
performs better than Intersect
when number of items grow up. Also note that for such trivial object (an integer) you won't have any performance gain to do it in parallel (see also Finding the difference between two lists of strings): comparison is so trivial that overhead and syncrhonization are higher than benefits (unless it's a well-tuned algorithm on a VERY high number of items).
If you're really looking for last bit of performance gain you may even implement your own BitArray
class (without unneeded stuff and error checking):
sealed class FastBitArray { public FastBitArray(int length) { m_array = new int[((length - 1) / 32) + 1]; } public bool this[int index] { get { return (m_array[index / 32] & (1 << (index % 32))) != 0; } set { if (value) m_array[index / 32] |= (1 << (index % 32)); else m_array[index / 32] &= ~(1 << (index % 32)); } } private int[] m_array; }
Note that inside setter there is a branch, we don't have to worry to optimize it away because pattern is easy (always true
) for branch predictor. No performance gain to make it more complicate than this.
Latest tests:
Number of iterations: 100
Number of items in each list: 1000000
HashSet<int>
: 144748 ticksBitArray
: 37292 ticksFastBitArray
: 28966 ticks
Let's compare them visually (blue series is test with 1,000 items, orange series is 1,000,000; Y axis is logarithmic to easy comparison with 1k series). Methods we know are slow are simply omitted:
Same data showing only 1M series and with linear Y axis:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With