Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Check if one collection of values contains another

Tags:

Suppose I have two collections as follows:

Collection1: "A1" "A1" "M1" "M2"

Collection2: "M2" "M3" "M1" "A1" "A1" "A2"

all the values are string values. I want to know if all the elements in Collection1 are contained in Collection2, but I have no guarantee on the order and a set may have multiple entries with the same value. In this case, Collection2 does contain Collection1 because Collection2 has two A1's, M1 and M2. Theres the obvious way: sorting both collections and popping off values as i find matches, but I was wondering if there's a faster more efficient way to do this. Again with the initial collections I have no guarantee on the order or how many times a given value will appear

EDIT: Changed set to collection just to clear up that these aren't sets as they can contain duplicate values

like image 798
Megatron Avatar asked Mar 02 '11 02:03

Megatron


People also ask

How to check if an element is present in a collection?

The contains (Object element) of java.util.Collection interface is used to check whether the element ‘element’ exists in this collection. This method returns a boolean value depicting the presence of the element. If the element is present, it returns true, else it returns false. Attention reader! Don’t stop learning now.

What is collection contains () method in Java?

Collection contains () method in Java with Examples Last Updated : 29 Nov, 2018 The contains (Object element) of java.util.Collection interface is used to check whether the element ‘element’ exists in this collection. This method returns a boolean value depicting the presence of the element.

Are all the values in Collection1 contained in collection2?

all the values are string values. I want to know if all the elements in Collection1 are contained in Collection2, but I have no guarantee on the order and a set may have multiple entries with the same value. In this case, Collection2 does contain Collection1 because Collection2 has two A1's, M1 and M2.

How to check if cell contains one of several values in Excel?

How to check if cell contains one of several values in Excel? 1 Check if a cell contains one of several values from a list with formulas. ... 2 Display the matches if cell contains one of several values from a list with formulas. ... 3 Highlight the matches if cell contains one of several values from a list with a handy feature. ... More items...


4 Answers

The most concise way I know of:

//determine if Set2 contains all of the elements in Set1
bool containsAll = Set1.All(s => Set2.Contains(s));
like image 143
w.brian Avatar answered Nov 16 '22 18:11

w.brian


Yes, there is a faster way, provided you're not space-constrained. (See space/time tradeoff.)

The algorithm:

Just insert all the elements in Set2 into a hashtable (in C# 3.5, that's a HashSet<string>), and then go through all the elements of Set1 and check if they're in the hashtable. This method is faster (Θ(m + n) time complexity), but uses O(n) space.

Alternatively, just say:

bool isSuperset = new HashSet<string>(set2).IsSupersetOf(set1);

Edit 1:

For those people concerned about the possibility of duplicates (and hence the misnomer "set"), the idea can easily be extended:

Just make a new Dictionary<string, int> representing the count of each word in the super-list (add one to the count each time you see an instance of an existing word, and add the word with a count of 1 if it's not in the dictionary), and then go through the sub-list and decrement the count each time. If every word exists in the dictionary and the count is never zero when you try to decrement it, then the subset is in fact a sub-list; otherwise, you had too many instances of a word (or it didn't exist at all), so it's not a real sub-list.


Edit 2:

If the strings are very big and you're concerned about space efficiency, and an algorithm that works with (very) high probability works for you, then try storing a hash of each string instead. It's technically not guaranteed to work, but the probability of it not working is pretty darn low.

like image 36
user541686 Avatar answered Nov 16 '22 17:11

user541686


The problem I see with the HashSet, Intersect, and other Set theory answers is that you do contain duplicates, and "A set is a collection that contains no duplicate elements". Here's a way to handle the duplicate cases.

var list1 = new List<string> { "A1", "A1", "M1", "M2" };
var list2 = new List<string> { "M2", "M3", "M1", "A1", "A1", "A2" };

// Remove returns true if it was able to remove it, and it won't be there to be matched again if there's a duplicate in list1
bool areAllPresent = list1.All(i => list2.Remove(i));

EDIT: I renamed from Set1 and Set2 to list1 and list2 to appease Mehrdad.

EDIT 2: The comment implies it, but I wanted to explicitly state that this does alter list2. Only do it this way if you're using it as a comparison or control but don't need the contents afterwards.

like image 32
David Ruttka Avatar answered Nov 16 '22 19:11

David Ruttka


Check out linq...

string[] set1 = {"A1", "A1", "M1", "M2" };
string[]  set2 = { "M2", "M3", "M1", "A1", "A1", "A2" };

var matching = set1.Intersect(set2);

foreach (string x in matching)
{
    Console.WriteLine(x);
}
like image 45
John Sobolewski Avatar answered Nov 16 '22 17:11

John Sobolewski