Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find duplicates in python list of dictionaries

I have kind of dictionary below:

a = [{'un': 'a', 'id': "cd"}, {'un': 'b', 'id': "cd"},{'un': 'b', 'id':    "cd"}, {'un': 'c', 'id': "vd"},
    {'un': 'c', 'id': "a"}, {'un': 'c', 'id': "vd"}, {'un': 'a', 'id': "cm"}]

I need to find the duplicates of dictionaries by 'un' key, for example this {'un': 'a', 'id': "cd"} and this {'un': 'a', 'id': "cm"} dicts are duplicates by value of key 'un' secondly when the duplicates are found I need to make decision what dict to keep concerning its second value of the key 'id', for example we keep dict with pattern value "cm".

I have already made the firs step see the code below:

from collections import defaultdict
temp_ids = []
dup_dict = defaultdict(list)
for number, row  in enumerate(a):
    id = row['un']
    if id not in temp_ids:
        temp_ids.append(id)
    else:
        tally[id].append(number)

Using this code I more or less able to find indexes of duplicate lists, maybe there is other method to do it. And also I need the next step code that makes decision what dict keep and what omit. Will be very grateful for help.

like image 617
Yan Avatar asked Jul 27 '16 19:07

Yan


People also ask

What is a duplicate dictionary in Python?

A duplicate dictionary would be one that has the same values for both keys ‘name’ and ‘score’. With a list comprehension we can generate a list of lists where each list contains both values for each dictionary:

How to check if a list has duplicate elements in Python?

Check if a list has duplicate elements using the count() method Check if a list has duplicate elements using the counter() method Conclusion Check if a list has duplicate Elements using Sets We know that sets in Python contain only unique elements. We can use this property of sets to check if a list has duplicate elements or not.

How to sort a list without duplicates in Python?

If you want the list to be sorted you can do it using the list sort () method in the get_list_without_duplicates () function before the return statement.

Are there duplicate values in the dictionary while the keys remain unique?

While dealing with dictionaries, we may come across situation when there are duplicate values in the dictionary while obviously the keys remain unique. In this article we will see how we can achieve thi.


1 Answers

In general if you want to find duplicates in a list of dictionaries you should categorize your dictionaries in a way that duplicate ones stay in same groups. For that purpose you need to categorize based on dict items. Now, since for dictionaries Order is not an important factor you need to use a container that is both hashable and doesn't keep the order of its container. A frozenset() is the best choice for this task.

Example:

In [87]: lst = [{2: 4, 6: 0},{20: 41, 60: 88},{5: 10, 2: 4, 6: 0},{20: 41, 60: 88},{2: 4, 6: 0}]

In [88]: result = defaultdict(list)

In [89]: for i, d in enumerate(lst):
    ...:     result[frozenset(d.items())].append(i)
    ...:     
In [91]: result
Out[91]: 
defaultdict(list,
            {frozenset({(2, 4), (6, 0)}): [0, 4],
             frozenset({(20, 41), (60, 88)}): [1, 3],
             frozenset({(2, 4), (5, 10), (6, 0)}): [2]})

And in this case, you can categorize your dictionaries based on 'un' key then choose the expected items based on id:

>>> from collections import defaultdict
>>> 
>>> d = defaultdict(list)
>>> 
>>> for i in a:
...     d[i['un']].append(i)
... 
>>> d
defaultdict(<type 'list'>, {'a': [{'un': 'a', 'id': 'cd'}, {'un': 'a', 'id': 'cm'}], 'c': [{'un': 'c', 'id': 'vd'}, {'un': 'c', 'id': 'a'}, {'un': 'c', 'id': 'vd'}], 'b': [{'un': 'b', 'id': 'cd'}, {'un': 'b', 'id': 'cd'}]})
>>> 
>>> keeps = {'a': 'cm', 'b':'cd', 'c':'vd'} # the key is 'un' and the value is 'id' should be keep for that 'un'
>>> 
>>> [i for key, val in d.items() for i in val if i['id']==keeps[key]]
[{'un': 'a', 'id': 'cm'}, {'un': 'c', 'id': 'vd'}, {'un': 'c', 'id': 'vd'}, {'un': 'b', 'id': 'cd'}, {'un': 'b', 'id': 'cd'}]
>>> 

In the last line (the nested list comprehension) we loop over the aggregated dict's items then over the values and keep those items within the values that follows or condition which is i['id']==keeps[key] that means we will keep the items that has an id with specified values in keeps dictionary.

You can beak the list comprehension to something like this:

final_list = []
for key, val in d.items():
    for i in val:
        if i['id']==keeps[key]:
             final_list.append(i)

Note that since the iteration of list comprehensions has performed in C it's very faster than regular python loops and in the pythonic way to go. But if the performance is not important for you you can use the regular approach.

like image 108
Mazdak Avatar answered Sep 17 '22 12:09

Mazdak