I have a list of dictionaries in Python, which looks like following:
d = [{feature_a:1, feature_b:'Jul', feature_c:100}, {feature_a:2, feature_b:'Jul', feature_c:150}, {feature_a:1, feature_b:'Mar', feature_c:110}, ...]
What I want to achieve is that to keep the feature_a
, _b
and _c
unique.
For example, if we have 3 entries which have the same feature_a
and _b
, but have 3 different values of feature_c
100
, 100
, 150
, then after the operation, it should be 100
and 150
.
How can I achieve this?
================================================================ UPDATE:
OK, Thanks for Anand's excellent answer, it works perfectly. However, I have a further question.
Suppose we have a new feature_d
and the dictionary looks like:
d = [{feature_a:1, feature_b:'Jul', feature_c:100, feature_d:'A'}, {feature_a:2, feature_b:'Jul', feature_c:150, feature_d: 'B'}, {feature_a:1, feature_b:'Mar', feature_c:110, feature_d:'F'}, ...]
and I only want to deduplicate feature_a
, _b
and _c
, but leave feature_d
out. How can I achieve this?
Many thanks.
We can use the dict. fromkeys method of the dict class to get unique values from a Python list. This method preserves the original order of the elements and keeps only the first element from the duplicates.
Using Python's import numpy, the unique elements in the array are also obtained. In the first step convert the list to x=numpy. array(list) and then use numpy. unique(x) function to get the unique values from the list.
To get a list of unique dictionaries with Python, we can use dict comprehension. which creates a dictionary with the key being the id value of the dicts in L . And we set v to the dict with the given 'id' value.
Appending a dictionary to a list with the same key and different values. Using append() method. Using copy() method to list using append() method. Using deepcopy() method to list using append() method.
If the order of the initial d
list is not important , you can take the .items()
of each dictionary and convert it into a frozenset()
, which is hashable, and then you can convert the whole thing to a set()
or frozenset()
, and then convert each frozenset()
back to dictionary. Example -
uniq_d = list(map(dict, frozenset(frozenset(i.items()) for i in d)))
sets()
do not allow duplicate elements. Though you would end up losing the order of the list. For Python 2.x , the list(...)
is not needed, as map()
returns a list.
Example/Demo -
>>> import pprint
>>> pprint.pprint(d)
[{'feature_a': 1, 'feature_b': 'Jul', 'feature_c': 100},
{'feature_a': 2, 'feature_b': 'Jul', 'feature_c': 150},
{'feature_a': 1, 'feature_b': 'Mar', 'feature_c': 110},
{'feature_a': 1, 'feature_b': 'Jul', 'feature_c': 100},
{'feature_a': 1, 'feature_b': 'Jul', 'feature_c': 150}]
>>> uniq_d = list(map(dict, frozenset(frozenset(i.items()) for i in d)))
>>> pprint.pprint(uniq_d)
[{'feature_a': 1, 'feature_b': 'Jul', 'feature_c': 100},
{'feature_a': 1, 'feature_b': 'Jul', 'feature_c': 150},
{'feature_a': 1, 'feature_b': 'Mar', 'feature_c': 110},
{'feature_a': 2, 'feature_b': 'Jul', 'feature_c': 150}]
For the new requirement -
However, what if that I have another feature_d but I only want to dedup feature_a, _b and _c
If two entries which have same feature_a, _b and _c, they are considered the same and duplicated, no matter what is in feature_d
A simple way to do this is to use a set and a new list, add only the features you need to the set, and check using only the features you need. Example -
seen_set = set()
new_d = []
for i in d:
if tuple([i['feature_a'],i['feature_b'],i['feature_c']]) not in seen_set:
new_d.append(i)
seen_set.add(tuple([i['feature_a'],i['feature_b'],i['feature_c']]))
Example/Demo -
>>> d = [{'feature_a':1, 'feature_b':'Jul', 'feature_c':100, 'feature_d':'A'},
... {'feature_a':2, 'feature_b':'Jul', 'feature_c':150, 'feature_d': 'B'},
... {'feature_a':1, 'feature_b':'Mar', 'feature_c':110, 'feature_d':'F'},
... {'feature_a':1, 'feature_b':'Mar', 'feature_c':110, 'feature_d':'G'}]
>>> seen_set = set()
>>> new_d = []
>>> for i in d:
... if tuple([i['feature_a'],i['feature_b'],i['feature_c']]) not in seen_set:
... new_d.append(i)
... seen_set.add(tuple([i['feature_a'],i['feature_b'],i['feature_c']]))
...
>>> pprint.pprint(new_d)
[{'feature_a': 1, 'feature_b': 'Jul', 'feature_c': 100, 'feature_d': 'A'},
{'feature_a': 2, 'feature_b': 'Jul', 'feature_c': 150, 'feature_d': 'B'},
{'feature_a': 1, 'feature_b': 'Mar', 'feature_c': 110, 'feature_d': 'F'}]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With