Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Count items existing in 2 Lists

Tags:

I have two int type List like List A and List B. I want to check how many items of List A are there in List B. I am able to do this, but what can be an efficient way as I am trying to avoid foreach, as optimization is a prime target in my code.

List<int> A = new List<int>; List<int> B = new List<int>; // Some logic....item added in both lists. Then  foreach(var item in A) {     if (B.Contains(item))     {         // Subtract number of duplicates     } } 

I tried using Intersect and Any, but that returns bool so I'm not able to apply them completely.

like image 678
JulyOrdinary Avatar asked Aug 05 '13 09:08

JulyOrdinary


People also ask

How do you count similar elements in two lists in Python?

Using the intersection() Function This is the easiest method to find common elements in two lists in Python. As the name suggests, the intersection() function is a built-in python function that is used to return a set that contains the elements which are common in two sets.

How to count the number of lists in a list Python?

The most straightforward way to get the number of elements in a list is to use the Python built-in function len() . As the name function suggests, len() returns the length of the list, regardless of the types of elements in it.

How can I count the occurrences of a list item C #?

Use the list. count() method of the built-in list class to get the number of occurrences of an item in the given list.

How do you count the number of common elements in a list Python?

Method #3: Using set() + len() The task performed can also be perform using the set and len function. We can get a common element using set intersection and count the total common element using len function.


2 Answers

B.Intersect(A).Count(); //should do the job 
like image 96
It'sNotALie. Avatar answered Oct 18 '22 22:10

It'sNotALie.


Standard implementation B.Intersect(A).Count() has big advantage to be short and readable, unless you have a measured performance issue you should go with it.

When performance are an issue you can introduce HashSet<int>, it's a good compromise in resources usage and search time. However because you worry about performance we should perform some testing (I am using this free tool I wrote):

CPU: 1.8 GHz Pentium Core 2 Duo
Number of iterations: 100
Number of items in each list: 1000

A.Where(a => B.Contains(a)).Count(): 8338 ticks
A.Intersect(B).Count(): 288 ticks
B.Count - B.Except(A).Count(): 313 ticks

Let's now introduce HashSet<int> in our test (pick implementation from any other answer):

HashSet<int>: 163 ticks

It performs much better. Can we do better? If input range is known (and limited) you can do much better than this using BitArray. In this example I assume (for simplicity) only positive numbers but it's easy to be adapted.

public static int UseBitArray(int range, List<int> listA, List<int> listB) {     var BitArray array = new BitArray(range);     for (int i = 0; i < listA.Count; ++i)         array[listA[i]] = true;      int count = 0;     for (int i = 0; i < listB.Count; ++i) {         if (array[listB[i]])             ++count;     }      return count; } 

How it performs?

BitArray: 95 ticks

Performance comparison

It takes only 58% of 2nd best method (HashSet<int>). I don't even compare with others. Note that it uses memory heavily and for a wide range (let's say Int32.MaxValue / 2) it uses a lot of memory (moreover its size is limited to Int32.MaxValue then you can't have full signed 32 bit integer range. If its limitations aren't a problem for you then you should definitely go with it.

Also note that if you can do some assumptions about your inputs then you can further optimize your search function (for example if you can assume that sets are ordered).

How they scale up (Y axis scale is logarithmic):

Performance comparison with different input sets

Note that Except performs better than Intersect when number of items grow up. Also note that for such trivial object (an integer) you won't have any performance gain to do it in parallel (see also Finding the difference between two lists of strings): comparison is so trivial that overhead and syncrhonization are higher than benefits (unless it's a well-tuned algorithm on a VERY high number of items).

If you're really looking for last bit of performance gain you may even implement your own BitArray class (without unneeded stuff and error checking):

sealed class FastBitArray {     public FastBitArray(int length) {         m_array = new int[((length - 1) / 32) + 1];     }      public bool this[int index] {         get {             return (m_array[index / 32] & (1 << (index % 32))) != 0;         }         set {             if (value)                 m_array[index / 32] |= (1 << (index % 32));             else                 m_array[index / 32] &= ~(1 << (index % 32));         }     }      private int[] m_array; } 

Note that inside setter there is a branch, we don't have to worry to optimize it away because pattern is easy (always true) for branch predictor. No performance gain to make it more complicate than this.

Latest tests:

Number of iterations: 100
Number of items in each list: 1000000

HashSet<int>: 144748 ticks
BitArray: 37292 ticks
FastBitArray: 28966 ticks

Let's compare them visually (blue series is test with 1,000 items, orange series is 1,000,000; Y axis is logarithmic to easy comparison with 1k series). Methods we know are slow are simply omitted:

Performance comparison chart 1

Same data showing only 1M series and with linear Y axis:

Performance comparison chart 2

like image 32
Adriano Repetti Avatar answered Oct 18 '22 21:10

Adriano Repetti