Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Did I reinvent the wheel with this deduplicating function?

I was looking for a set()-like method to deduplicate a list, except that the items figuring in the original list are not hashable (they are dicts).

I spent a while looking for something adequate, and I ended up writing this little function:

def deduplicate_list(lst, key):
    output = []
    keys = []
    for i in lst:
        if not i[key] in keys:
            output.append(i)
            keys.append(i[key])

    return output

Provided that a key is correctly given and is a string, this function does its job pretty well. Needless to say, if I learn about a built-in or a standard library module which allows the same functionality, I'll happily drop my little routine in favor of a more standard and robust choice.

Are you aware of such implementation?

-- Note

The following one-liner found from this answer,

[dict(t) for t in set([tuple(d.items()) for d in l])]

while clever, won't work because I have to work with items as nested dicts.

-- Example

For clarity purposes, here is an example of using such a routine:

with_duplicates = [
    {
        "type": "users",
        "attributes": {
            "first-name": "John",
            "email": "[email protected]",
            "last-name": "Smith",
            "handle": "jsmith"
        },
        "id": "1234"
    },
    {
        "type": "users",
        "attributes": {
            "first-name": "John",
            "email": "[email protected]",
            "last-name": "Smith",
            "handle": "jsmith"
        },
        "id": "1234"
    }
]

without_duplicates = deduplicate_list(with_duplicates, key='id')
like image 215
Jivan Avatar asked Jun 03 '16 12:06

Jivan


1 Answers

You are picking only the first dict in your list for every distinct value for key. itertools.groupby is the built-in tool that can do that for you - sort and group by key and take only the first from each group:

from itertools import groupby

def deduplicate(lst, key):
    fnc = lambda d: d.get(key)  # more robust than d[key]
    return [next(g) for k, g in groupby(sorted(lst, key=fnc), key=fnc)]
like image 75
user2390182 Avatar answered Oct 21 '22 20:10

user2390182