Remove duplicates key from list of dictionaries python

Tags:

python

I am trying to remove the duplicates from following list:

distinct_cur = [
    {'rtc': 0, 'vf': 0, 'mtc': 0, 'doc': 'good job', 'foc': 195, 'st': 0.0, 'htc': 2, '_id': ObjectId('58e86a550a0aeff4e14ca6bb'), 'ftc': 0}, 
    {'rtc': 0, 'vf': 0, 'mtc': 0, 'doc': 'good job', 'foc': 454, 'st': 0.8, 'htc': 1, '_id': ObjectId('58e8d03958ae6d179c2b4413'), 'ftc': 1},
    {'rtc': 0, 'vf': 2, 'mtc': 1, 'doc': 'test', 'foc': 45, 'st': 0.8, 'htc': 12, '_id': ObjectId('58e8d03958ae6d180c2b4446'), 'ftc': 0}
]

Of dictionaries based on condition that if 'doc' key value text is same then one of the dictionary should be removed. I have tried the following solution:

distinct_cur = [dict(y) for y in set(tuple(x.items()) for x in cur)]

But duplicates are still present in the final list.

Below is the desired output as in 1st and 2nd distinct_cur text of key 'doc' value is same (good job):

[
    {'rtc': 0, 'vf': 0, 'mtc': 0, 'doc': 'good job', 'foc': 195, 'st': 0.0, 'htc': 2, '_id': ObjectId('58e86a550a0aeff4e14ca6bb'), 'ftc': 0}, 
    {'rtc': 0, 'vf': 2, 'mtc': 1, 'doc': 'test', 'foc': 45, 'st': 0.8, 'htc': 12, '_id': ObjectId('58e8d03958ae6d180c2b4446'), 'ftc': 0}
]

484

asked Apr 10 '17 09:04

shanky

4 Answers

You're creating a set out of different elements and expect that it will remove the duplicates based on a criterion that only you know.

You have to iterate through your list, and add to the result list only if doc has a different value than the previous ones: for instance like this:

done = set()
result = []
for d in distinct_cur:
    if d['doc'] not in done:
        done.add(d['doc'])  # note it down for further iterations
        result.append(d)

that will keep only the first occurrence(s) of the dictionaries which have the same doc key by registering the known keys in an aux set.

Another possibility is to use a dictionary with the key as the "doc" key of the dictionary, iterating backwards in the list so the first items overwrite the last ones in the list:

result = {i['doc']:i for i in reversed(distinct_cur)}.values()

answered Oct 11 '22 05:10

Jean-François Fabre

I see 2 similar solutions that depend on your domain problem: do you want to keep the first instance of a key or the last instance?

Using the last (so as to overwrite the previous matches) is simpler:

d = {r['doc']: r for r in distinct_cur}.values()

answered Oct 11 '22 04:10

smassey

One liner to deduplicate the list of dictionaries distinct_cur on the primary_key of doc

[i for n, i in enumerate(distinct_cur) if i.get('doc') not in [y.get('doc') for y in distinct_cur[n + 1:]]]

answered Oct 11 '22 04:10

Alec

Try this:

distinct_cur  =[dict(t) for t in set([tuple(d.items()) for d in distinct_cur])]

Worked for me...

answered Oct 11 '22 05:10

Yuval Pruss

Related questions
                            
                                Imported Enum class is not comparing equal to itself
                            
                                Can we return after raise statement
                            
                                How to Transpose each element in a 3D np array
                            
                                How to delete a django JWT token?
                            
                                Load npy file from S3 in python
                            
                                pyqt4 window resize event
                            
                                h5py, access data in Datasets in SVHN
                            
                                Splitting a 2 dimensional array or a list into two 1 dimensional lists in python [duplicate]
                            
                                Error installing pydns
                            
                                How to swap a group of column headings with their values in Pandas
                            
                                numpy.sum() giving strange results on large arrays
                            
                                Qt has no attribute 'AlignCenter' [duplicate]
                            
                                Accuracy difference on normalization in KNN
                            
                                ImportError: No module named 'tasks'
                            
                                Pandas Fillna of Multiple Columns with Mode of Each Column
                            
                                Python Proxy Settings
                            
                                Check if any of a list of strings is in another string
                            
                                python dataframe write to R data format
                            
                                Converting a dict to XML with attributes
                            
                                Simple server-side Flask session variable

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With