Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

matching items from two lists (or arrays)

Tags:

arrays

c#

.net

list

I'm having a problem with my work that hopefully reduces to the following: I have two List<int>s, and I want to see if any of the ints in ListA are equal to any int in ListB. (They can be arrays if that makes life easier, but I think List<> has some built-in magic that might help.) And I'm sure this is a LINQ-friendly problem, but I'm working in 2.0 here.

My solution so far has been to foreach through ListA, and then foreach through ListB,

foreach (int a in ListA) {     foreach (int b in ListB)     {         if (a == b)         {             return true;         }     } } 

which was actually pretty slick when they were each three items long, but now they're 200 long and they frequently don't match, so we get the worst-case of N^2 comparisons. Even 40,000 comparisons go by pretty fast, but I think I might be missing something, since N^2 seems pretty naive for this particular problem.

Thanks!

like image 399
dnord Avatar asked Jan 07 '09 05:01

dnord


People also ask

How do you find the matching element in two lists?

You can use the IF Function to compare two lists in Excel for matches in the same row. If Function will return the value TRUE if the values match and FALSE if they don't.

How do I search for two lists in Python?

We can club the Python sort() method with the == operator to compare two lists. Python sort() method is used to sort the input lists with a purpose that if the two input lists are equal, then the elements would reside at the same index positions.


2 Answers

With LINQ, this is trivial, as you can call the Intersect extension method on the Enumerable class to give you the set intersection of the two arrays:

var intersection = ListA.Intersect(ListB); 

However, this is the set intersection, meaning if ListA and ListB don't have unique values in it, you won't get any copies. In other words if you have the following:

var ListA = new [] { 0, 0, 1, 2, 3 }; var ListB = new [] { 0, 0, 0, 2 }; 

Then ListA.Intersect(ListB) produces:

{ 0, 2 } 

If you're expecting:

{ 0, 0, 2 } 

Then you're going to have to maintain a count of the items yourself and yield/decrement as you scan the two lists.

First, you'd want to collect a Dictionary<TKey, int> with the lists of the individual items:

var countsOfA = ListA.GroupBy(i => i).ToDictionary(g => g.Key, g => g.Count()); 

From there, you can scan ListB and place that in a list when you come across an item in countsOfA:

// The items that match. IList<int> matched = new List<int>();  // Scan  foreach (int b in ListB) {     // The count.     int count;      // If the item is found in a.     if (countsOfA.TryGetValue(b, out count))     {         // This is positive.         Debug.Assert(count > 0);          // Add the item to the list.         matched.Add(b);          // Decrement the count.  If         // 0, remove.         if (--count == 0) countsOfA.Remove(b);     } } 

You can wrap this up in an extension method that defers execution like so:

public static IEnumerable<T> MultisetIntersect(this IEnumerable<T> first,     IEnumerable<T> second) {     // Call the overload with the default comparer.     return first.MultisetIntersect(second, EqualityComparer<T>.Default); }  public static IEnumerable<T> MultisetIntersect(this IEnumerable<T> first,     IEnumerable<T> second, IEqualityComparer<T> comparer) {     // Validate parameters.  Do this separately so check     // is performed immediately, and not when execution     // takes place.     if (first == null) throw new ArgumentNullException("first");     if (second == null) throw new ArgumentNullException("second");     if (comparer == null) throw new ArgumentNullException("comparer");      // Defer execution on the internal     // instance.     return first.MultisetIntersectImplementation(second, comparer); }  private static IEnumerable<T> MultisetIntersectImplementation(     this IEnumerable<T> first, IEnumerable<T> second,      IEqualityComparer<T> comparer) {     // Validate parameters.     Debug.Assert(first != null);     Debug.Assert(second != null);     Debug.Assert(comparer != null);      // Get the dictionary of the first.     IDictionary<T, long> counts = first.GroupBy(t => t, comparer).         ToDictionary(g => g.Key, g.LongCount(), comparer);      // Scan      foreach (T t in second)     {         // The count.         long count;          // If the item is found in a.         if (counts.TryGetValue(t, out count))         {             // This is positive.             Debug.Assert(count > 0);              // Yield the item.             yield return t;              // Decrement the count.  If             // 0, remove.             if (--count == 0) counts.Remove(t);         }     } } 

Note that both of these approaches are (and I apologize if I'm butchering Big-O notation here) O(N + M) where N is the number of items in the first array, and M is the number of items in the second array. You have to scan each list only once, and it's assumed that getting the hash codes and performing lookups on the hash codes is an O(1) (constant) operation.

like image 194
casperOne Avatar answered Oct 05 '22 01:10

casperOne


Load the whole of ListA into a HashSet instance, and then test foreach item in ListB against the HastSet: I'm pretty sure that this would be O(N).

//untested code ahead HashSet<int> hashSet = new HashSet<int>(ListA); foreach (int i in ListB) {     if (hashSet.Contains(i))         return true; } 

Here's the same thing in one line:

return new HashSet<int>(ListA).Overlaps(ListB); 

HashSet doesn't exist in .NET 3.5, so in .NET 2.0 you can use Dictionary<int,object> (instead of using HashSet<int>), and always store null as the object/value in the Dictionary since you're only interested in the keys.

like image 38
ChrisW Avatar answered Oct 05 '22 03:10

ChrisW