Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove duplicates key from list of dictionaries python

Tags:

python

I am trying to remove the duplicates from following list:

distinct_cur = [
    {'rtc': 0, 'vf': 0, 'mtc': 0, 'doc': 'good job', 'foc': 195, 'st': 0.0, 'htc': 2, '_id': ObjectId('58e86a550a0aeff4e14ca6bb'), 'ftc': 0}, 
    {'rtc': 0, 'vf': 0, 'mtc': 0, 'doc': 'good job', 'foc': 454, 'st': 0.8, 'htc': 1, '_id': ObjectId('58e8d03958ae6d179c2b4413'), 'ftc': 1},
    {'rtc': 0, 'vf': 2, 'mtc': 1, 'doc': 'test', 'foc': 45, 'st': 0.8, 'htc': 12, '_id': ObjectId('58e8d03958ae6d180c2b4446'), 'ftc': 0}
]

Of dictionaries based on condition that if 'doc' key value text is same then one of the dictionary should be removed. I have tried the following solution:

distinct_cur = [dict(y) for y in set(tuple(x.items()) for x in cur)]

But duplicates are still present in the final list.

Below is the desired output as in 1st and 2nd distinct_cur text of key 'doc' value is same (good job):

[
    {'rtc': 0, 'vf': 0, 'mtc': 0, 'doc': 'good job', 'foc': 195, 'st': 0.0, 'htc': 2, '_id': ObjectId('58e86a550a0aeff4e14ca6bb'), 'ftc': 0}, 
    {'rtc': 0, 'vf': 2, 'mtc': 1, 'doc': 'test', 'foc': 45, 'st': 0.8, 'htc': 12, '_id': ObjectId('58e8d03958ae6d180c2b4446'), 'ftc': 0}
]
like image 484
shanky Avatar asked Apr 10 '17 09:04

shanky


People also ask

How do I remove duplicates from a dictionary list?

Using unique everseen() for Removing duplicate dictionaries in a list. everseen() function is used to find all the unique elements present in the iterable and preserving their order of occurrence. Hence it remembers all elements ever seen in the iterable.

How do you remove duplicate keys in Python?

You can remove duplicates from a Python using the dict. fromkeys(), which generates a dictionary that removes any duplicate values. You can also convert a list to a set. You must convert the dictionary or set back into a list to see a list whose duplicates have been removed.

Do dictionaries have duplicate key values Python?

Dictionaries do not support duplicate keys. However, more than one value can correspond to a single key using a list.

Do dictionaries have duplicate key values?

Why you can not have duplicate keys in a dictionary? You can not have duplicate keys in Python, but you can have multiple values associated with a key in Python. If you want to keep duplicate keys in a dictionary, you have two or more different values that you want to associate with same key in dictionary.


4 Answers

You're creating a set out of different elements and expect that it will remove the duplicates based on a criterion that only you know.

You have to iterate through your list, and add to the result list only if doc has a different value than the previous ones: for instance like this:

done = set()
result = []
for d in distinct_cur:
    if d['doc'] not in done:
        done.add(d['doc'])  # note it down for further iterations
        result.append(d)

that will keep only the first occurrence(s) of the dictionaries which have the same doc key by registering the known keys in an aux set.

Another possibility is to use a dictionary with the key as the "doc" key of the dictionary, iterating backwards in the list so the first items overwrite the last ones in the list:

result = {i['doc']:i for i in reversed(distinct_cur)}.values()
like image 95
Jean-François Fabre Avatar answered Oct 11 '22 05:10

Jean-François Fabre


I see 2 similar solutions that depend on your domain problem: do you want to keep the first instance of a key or the last instance?

Using the last (so as to overwrite the previous matches) is simpler:

d = {r['doc']: r for r in distinct_cur}.values()
like image 35
smassey Avatar answered Oct 11 '22 04:10

smassey


One liner to deduplicate the list of dictionaries distinct_cur on the primary_key of doc

[i for n, i in enumerate(distinct_cur) if i.get('doc') not in [y.get('doc') for y in distinct_cur[n + 1:]]]
like image 2
Alec Avatar answered Oct 11 '22 04:10

Alec


Try this:

distinct_cur  =[dict(t) for t in set([tuple(d.items()) for d in distinct_cur])]

Worked for me...

like image 1
Yuval Pruss Avatar answered Oct 11 '22 05:10

Yuval Pruss