Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I verify if one list is a subset of another?

Tags:

python

list

People also ask

How do you check if an item in one list is in another list Python?

There are 2 ways to understand check if the list contains elements of another list. First, use all() functions to check if a Python list contains all the elements of another list. And second, use any() function to check if the list contains any elements of another one.

How do you check if a set is a subset in Python?

Python Set issubset() The issubset() method returns True if set A is the subset of B , i.e. if all the elements of set A are present in set B . Else, it returns False .


>>> a = [1, 3, 5]
>>> b = [1, 3, 5, 8]
>>> c = [3, 5, 9]
>>> set(a) <= set(b)
True
>>> set(c) <= set(b)
False

>>> a = ['yes', 'no', 'hmm']
>>> b = ['yes', 'no', 'hmm', 'well']
>>> c = ['sorry', 'no', 'hmm']
>>> 
>>> set(a) <= set(b)
True
>>> set(c) <= set(b)
False

Use set.issubset

Example:

a = {1,2}
b = {1,2,3}
a.issubset(b) # True
a = {1,2,4}
b = {1,2,3}
a.issubset(b) # False

The performant function Python provides for this is set.issubset. It does have a few restrictions that make it unclear if it's the answer to your question, however.

A list may contain items multiple times and has a specific order. A set does not. Additionally, sets only work on hashable objects.

Are you asking about subset or subsequence (which means you'll want a string search algorithm)? Will either of the lists be the same for many tests? What are the datatypes contained in the list? And for that matter, does it need to be a list?

Your other post intersect a dict and list made the types clearer and did get a recommendation to use dictionary key views for their set-like functionality. In that case it was known to work because dictionary keys behave like a set (so much so that before we had sets in Python we used dictionaries). One wonders how the issue got less specific in three hours.


one = [1, 2, 3]
two = [9, 8, 5, 3, 2, 1]

all(x in two for x in one)

Explanation: Generator creating booleans by looping through list one checking if that item is in list two. all() returns True if every item is truthy, else False.

There is also an advantage that all return False on the first instance of a missing element rather than having to process every item.


Assuming the items are hashable

>>> from collections import Counter
>>> not Counter([1, 2]) - Counter([1])
False
>>> not Counter([1, 2]) - Counter([1, 2])
True
>>> not Counter([1, 2, 2]) - Counter([1, 2])
False

If you don't care about duplicate items eg. [1, 2, 2] and [1, 2] then just use:

>>> set([1, 2, 2]).issubset([1, 2])
True

Is testing equality on the smaller list after an intersection the fastest way to do this?

.issubset will be the fastest way to do it. Checking the length before testing issubset will not improve speed because you still have O(N + M) items to iterate through and check.


One more solution would be to use a intersection.

one = [1, 2, 3]
two = [9, 8, 5, 3, 2, 1]

set(one).intersection(set(two)) == set(one)

The intersection of the sets would contain of set one

(OR)

one = [1, 2, 3]
two = [9, 8, 5, 3, 2, 1]

set(one) & (set(two)) == set(one)

Set theory is inappropriate for lists since duplicates will result in wrong answers using set theory.

For example:

a = [1, 3, 3, 3, 5]
b = [1, 3, 3, 4, 5]
set(b) > set(a)

has no meaning. Yes, it gives a false answer but this is not correct since set theory is just comparing: 1,3,5 versus 1,3,4,5. You must include all duplicates.

Instead you must count each occurrence of each item and do a greater than equal to check. This is not very expensive, because it is not using O(N^2) operations and does not require quick sort.

#!/usr/bin/env python

from collections import Counter

def containedInFirst(a, b):
  a_count = Counter(a)
  b_count = Counter(b)
  for key in b_count:
    if a_count.has_key(key) == False:
      return False
    if b_count[key] > a_count[key]:
      return False
  return True


a = [1, 3, 3, 3, 5]
b = [1, 3, 3, 4, 5]
print "b in a: ", containedInFirst(a, b)

a = [1, 3, 3, 3, 4, 4, 5]
b = [1, 3, 3, 4, 5]
print "b in a: ", containedInFirst(a, b)

Then running this you get:

$ python contained.py 
b in a:  False
b in a:  True

one = [1, 2, 3]
two = [9, 8, 5, 3, 2, 1]

set(x in two for x in one) == set([True])

If list1 is in list 2:

  • (x in two for x in one) generates a list of True.

  • when we do a set(x in two for x in one) has only one element (True).