Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to efficiently compare two unordered lists (not sets) in Python?

People also ask

How do you check if two lists are equal without orders?

Using Counter() , we usually are able to get frequency of each element in list, checking for it, for both the list, we can check if two lists are identical or not. But this method also ignores the ordering of the elements in the list and only takes into account the frequency of elements.

How do I compare two lists and differences in Python?

The difference between two lists (say list1 and list2) can be found using the following simple function. By Using the above function, the difference can be found using diff(temp2, temp1) or diff(temp1, temp2) . Both will give the result ['Four', 'Three'] .


O(n): The Counter() method is best (if your objects are hashable):

def compare(s, t):
    return Counter(s) == Counter(t)

O(n log n): The sorted() method is next best (if your objects are orderable):

def compare(s, t):
    return sorted(s) == sorted(t)

O(n * n): If the objects are neither hashable, nor orderable, you can use equality:

def compare(s, t):
    t = list(t)   # make a mutable copy
    try:
        for elem in s:
            t.remove(elem)
    except ValueError:
        return False
    return not t

You can sort both:

sorted(a) == sorted(b)

A counting sort could also be more efficient (but it requires the object to be hashable).

>>> from collections import Counter
>>> a = [1, 2, 3, 1, 2, 3]
>>> b = [3, 2, 1, 3, 2, 1]
>>> print (Counter(a) == Counter(b))
True

If you know the items are always hashable, you can use a Counter() which is O(n)
If you know the items are always sortable, you can use sorted() which is O(n log n)

In the general case you can't rely on being able to sort, or has the elements, so you need a fallback like this, which is unfortunately O(n^2)

len(a)==len(b) and all(a.count(i)==b.count(i) for i in a)

The best way to do this is by sorting the lists and comparing them. (Using Counter won't work with objects that aren't hashable.) This is straightforward for integers:

sorted(a) == sorted(b)

It gets a little trickier with arbitrary objects. If you care about object identity, i.e., whether the same objects are in both lists, you can use the id() function as the sort key.

sorted(a, key=id) == sorted(b, key==id)

(In Python 2.x you don't actually need the key= parameter, because you can compare any object to any object. The ordering is arbitrary but stable, so it works fine for this purpose; it doesn't matter what order the objects are in, only that the ordering is the same for both lists. In Python 3, though, comparing objects of different types is disallowed in many circumstances -- for example, you can't compare strings to integers -- so if you will have objects of various types, best to explicitly use the object's ID.)

If you want to compare the objects in the list by value, on the other hand, first you need to define what "value" means for the objects. Then you will need some way to provide that as a key (and for Python 3, as a consistent type). One potential way that would work for a lot of arbitrary objects is to sort by their repr(). Of course, this could waste a lot of extra time and memory building repr() strings for large lists and so on.

sorted(a, key=repr) == sorted(b, key==repr)

If the objects are all your own types, you can define __lt__() on them so that the object knows how to compare itself to others. Then you can just sort them and not worry about the key= parameter. Of course you could also define __hash__() and use Counter, which will be faster.


If you have to do this in tests: https://docs.python.org/3.5/library/unittest.html#unittest.TestCase.assertCountEqual

assertCountEqual(first, second, msg=None)

Test that sequence first contains the same elements as second, regardless of their order. When they don’t, an error message listing the differences between the sequences will be generated.

Duplicate elements are not ignored when comparing first and second. It verifies whether each element has the same count in both sequences. Equivalent to: assertEqual(Counter(list(first)), Counter(list(second))) but works with sequences of unhashable objects as well.

New in version 3.2.

or in 2.7: https://docs.python.org/2.7/library/unittest.html#unittest.TestCase.assertItemsEqual

Outside of tests I would recommend the Counter method.


If the comparison is to be performed in a testing context, use assertCountEqual(a, b) (py>=3.2) and assertItemsEqual(a, b) (2.7<=py<3.2).

Works on sequences of unhashable objects too.


If the list contains items that are not hashable (such as a list of objects) you might be able to use the Counter Class and the id() function such as:

from collections import Counter
...
if Counter(map(id,a)) == Counter(map(id,b)):
    print("Lists a and b contain the same objects")