Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sort and Compare List of Dicts Python

I am trying to find a way to sort and compare two lists of dictionaries in Python 3.6. I ultimately just want list_dict_a and list_dict_b to compare with == and evaluate to True.

Here is what the data looks like:

list_dict_a = [
{'expiration_date': None, 'identifier_country': None, 'identifier_number': 'Male', 'identifier_type': 'Gender', 'issue_date': None},
{'expiration_date': None, 'identifier_country': 'VE', 'identifier_number': '1234567', 'identifier_type': 'Foo No.', 'issue_date': None}]

list_dict_b = [
{'identifier_country': 'VE', 'expiration_date': None, 'identifier_type': 'Foo No.', 'issue_date': None, 'identifier_number': '1234567'},
{'identifier_country': None, 'expiration_date': None, 'identifier_type': 'Gender', 'issue_date': None, 'identifier_number': 'Male'}]

The data is the same, but it comes in different orders (I dont have any control over the initial order).

When I try to compare them as such, I get a false value when doing something like this: print("does this match anything",list_dict_a == list_dict_b)

Is this even possible to do?

like image 335
unseen_damage Avatar asked Dec 20 '17 19:12

unseen_damage


People also ask

Can you compare Dicts in Python?

You can use the == operator, and it will work. However, when you have specific needs, things become harder. The reason is, Python has no built-in feature allowing us to: compare two dictionaries and check how many pairs are equal.

How do you sort a list of key-value pairs in Python?

To sort a list of dictionaries according to the value of the specific key, specify the key parameter of the sort() method or the sorted() function. By specifying a function to be applied to each element of the list, it is sorted according to the result of that function.

How do I sort multiple dictionaries in Python?

Use a lambda function as key function to sort the list of dictionaries. Use the itemgetter function as key function to sort the list of dictionaries.

How do you sort a list of tuples in Python?

In python, to sort list of tuples by the first element in descending order, we have to use the sort() method with the parameter ” (reverse=True) “ which will sort the elements in descending order.


1 Answers

You can sort both lists before comparing them and compare the sorted results:

>>> list_dict_a = [
        {'expiration_date': None, 'identifier_country': None, 'identifier_number': 'Male', 'identifier_type': 'Gender', 'issue_date': None},
        {'expiration_date': None, 'identifier_country': 'VE', 'identifier_number': '1234567', 'identifier_type': 'Foo No.', 'issue_date': None}]

>>> list_dict_b = [
        {'identifier_country': 'VE', 'expiration_date': None, 'identifier_type': 'Foo No.', 'issue_date': None, 'identifier_number': '1234567'},
        {'identifier_country': None, 'expiration_date': None, 'identifier_type': 'Gender', 'issue_date': None, 'identifier_number': 'Male'}]

>>> list_dict_a == list_dict_b
False
>>> def key_func(d):
        items = ((k, v if v is not None else '') for k, v in d.items())
        return sorted(items)
>>> sorted(list_dict_a, key=key_func) == sorted(list_dict_b, key=key_func)
True

The order of the dicts within each list will then not matter.

Passing the key function is needed, because dicts are not orderable, thus we need to tell the sorting function what key to use for each pair of dict objects when comparing them. A key for each dictionary is simply a sorted list of its (key, value) pairs.

The key function calculates a key for each dict as follows:

>>> dict_a0 = list_dict_a[0]
>>> key_func(dict_a0)
[('expiration_date', ''), ('identifier_country', ''), ('identifier_number', 'Male'), ('identifier_type', 'Gender'), ('issue_date', '')]

Footnotes

In order for this list of (key, value) pairs to be comparable with other dicts' lists, None values had to be converted to an empty string. This allows None values to be comparable with other non-None values.

The underlying assumption in the solution above is that all dictionary values in your case are either strings or None, and that "empty" values are consistently represented as None (and not e.g. by an empty string). If this is not the case, key_func() would have to be adjusted accordingly to assure that the resulting lists are always comparable to each other for any dict value expected in the data.

Also, for large dicts this key function might not be ideal, because comparisons of key pairs would be too slow. It would thus be better to instead calculate a unique hash value for each dict (but the same hash for dicts that compare equal).

like image 162
plamut Avatar answered Oct 03 '22 16:10

plamut