<p>I have two int type <code>List</code> like <code>List A</code> and <code>List B</code>. I want to check how many items of <code>List A</code> are there in <code>List B</code>. I am able to do this, but what can be an efficient way as I am trying to avoid <code>foreach</code>, as optimization is a prime target in my code.</p> <pre class="prettyprint"><code>List<int> A = new List<int>; List<int> B = new List<int>; // Some logic....item added in both lists. Then foreach(var item in A) { if (B.Contains(item)) { // Subtract number of duplicates } } </code></pre> <p>I tried using <code>Intersect</code> and <code>Any</code>, but that returns <code>bool</code> so I'm not able to apply them completely.</p>

<pre class="prettyprint"><code>B.Intersect(A).Count(); //should do the job </code></pre>

Count items existing in 2 Lists

Tags:

I have two int type List like List A and List B. I want to check how many items of List A are there in List B. I am able to do this, but what can be an efficient way as I am trying to avoid foreach, as optimization is a prime target in my code.

List<int> A = new List<int>; List<int> B = new List<int>; // Some logic....item added in both lists. Then  foreach(var item in A) {     if (B.Contains(item))     {         // Subtract number of duplicates     } }

I tried using Intersect and Any, but that returns bool so I'm not able to apply them completely.

678

asked Aug 05 '13 09:08

JulyOrdinary

2 Answers

B.Intersect(A).Count(); //should do the job

answered Oct 18 '22 22:10

It'sNotALie.

Standard implementation B.Intersect(A).Count() has big advantage to be short and readable, unless you have a measured performance issue you should go with it.

When performance are an issue you can introduce HashSet<int>, it's a good compromise in resources usage and search time. However because you worry about performance we should perform some testing (I am using this free tool I wrote):

CPU: 1.8 GHz Pentium Core 2 Duo
Number of iterations: 100
Number of items in each list: 1000

A.Where(a => B.Contains(a)).Count(): 8338 ticks
A.Intersect(B).Count(): 288 ticks
B.Count - B.Except(A).Count(): 313 ticks

Let's now introduce HashSet<int> in our test (pick implementation from any other answer):

HashSet<int>: 163 ticks

It performs much better. Can we do better? If input range is known (and limited) you can do much better than this using BitArray. In this example I assume (for simplicity) only positive numbers but it's easy to be adapted.

public static int UseBitArray(int range, List<int> listA, List<int> listB) {     var BitArray array = new BitArray(range);     for (int i = 0; i < listA.Count; ++i)         array[listA[i]] = true;      int count = 0;     for (int i = 0; i < listB.Count; ++i) {         if (array[listB[i]])             ++count;     }      return count; }

How it performs?

BitArray: 95 ticks

Performance comparison

It takes only 58% of 2nd best method (HashSet<int>). I don't even compare with others. Note that it uses memory heavily and for a wide range (let's say Int32.MaxValue / 2) it uses a lot of memory (moreover its size is limited to Int32.MaxValue then you can't have full signed 32 bit integer range. If its limitations aren't a problem for you then you should definitely go with it.

Also note that if you can do some assumptions about your inputs then you can further optimize your search function (for example if you can assume that sets are ordered).

How they scale up (Y axis scale is logarithmic):

Performance comparison with different input sets

Note that Except performs better than Intersect when number of items grow up. Also note that for such trivial object (an integer) you won't have any performance gain to do it in parallel (see also Finding the difference between two lists of strings): comparison is so trivial that overhead and syncrhonization are higher than benefits (unless it's a well-tuned algorithm on a VERY high number of items).

If you're really looking for last bit of performance gain you may even implement your own BitArray class (without unneeded stuff and error checking):

sealed class FastBitArray {     public FastBitArray(int length) {         m_array = new int[((length - 1) / 32) + 1];     }      public bool this[int index] {         get {             return (m_array[index / 32] & (1 << (index % 32))) != 0;         }         set {             if (value)                 m_array[index / 32] |= (1 << (index % 32));             else                 m_array[index / 32] &= ~(1 << (index % 32));         }     }      private int[] m_array; }

Note that inside setter there is a branch, we don't have to worry to optimize it away because pattern is easy (always true) for branch predictor. No performance gain to make it more complicate than this.

Latest tests:

Number of iterations: 100
Number of items in each list: 1000000

HashSet<int>: 144748 ticks
BitArray: 37292 ticks
FastBitArray: 28966 ticks

Let's compare them visually (blue series is test with 1,000 items, orange series is 1,000,000; Y axis is logarithmic to easy comparison with 1k series). Methods we know are slow are simply omitted:

Performance comparison chart 1

Same data showing only 1M series and with linear Y axis:

Performance comparison chart 2

answered Oct 18 '22 21:10

Adriano Repetti

Related questions
                            
                                Sublime Text "Unable to save"
                            
                                TFS build broken - The directory is not empty on sources directory
                            
                                ConfigurationManager.GetSection Gives Error "Could not load type....from assembly..."
                            
                                async/await keywords not available in .net 4.0
                            
                                Angular, input field with a currency mask directive for money format on the fly
                            
                                SVG as source in <img> tag not displaying in Internet Explorer 11
                            
                                ERROR 1044 (42000): Access denied for 'root' With All Privileges
                            
                                The server encountered an error processing the request. See server logs for more details
                            
                                refresh DOM after append element
                            
                                Get color of each pixel of an image using BufferedImages
                            
                                Either to Try and vice versa in Scala
                            
                                What does fflush(stdin) do in C programing? [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With