I was looking for a set()
-like method to deduplicate a list, except that the items figuring in the original list are not hashable (they are dict
s).
I spent a while looking for something adequate, and I ended up writing this little function:
def deduplicate_list(lst, key):
output = []
keys = []
for i in lst:
if not i[key] in keys:
output.append(i)
keys.append(i[key])
return output
Provided that a key
is correctly given and is a string
, this function does its job pretty well. Needless to say, if I learn about a built-in or a standard library module which allows the same functionality, I'll happily drop my little routine in favor of a more standard and robust choice.
Are you aware of such implementation?
-- Note
The following one-liner found from this answer,
[dict(t) for t in set([tuple(d.items()) for d in l])]
while clever, won't work because I have to work with items as nested dict
s.
-- Example
For clarity purposes, here is an example of using such a routine:
with_duplicates = [
{
"type": "users",
"attributes": {
"first-name": "John",
"email": "[email protected]",
"last-name": "Smith",
"handle": "jsmith"
},
"id": "1234"
},
{
"type": "users",
"attributes": {
"first-name": "John",
"email": "[email protected]",
"last-name": "Smith",
"handle": "jsmith"
},
"id": "1234"
}
]
without_duplicates = deduplicate_list(with_duplicates, key='id')
You are picking only the first dict
in your list for every distinct value for key
. itertools.groupby
is the built-in tool that can do that for you - sort and group by key
and take only the first from each group:
from itertools import groupby
def deduplicate(lst, key):
fnc = lambda d: d.get(key) # more robust than d[key]
return [next(g) for k, g in groupby(sorted(lst, key=fnc), key=fnc)]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With