Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Best way of removing duplicates from a list by object attribute

I have a list of objects, and I want to filter the list in a way that as a result there is only one occurence of each attribute value.

For instance, let's say I have three objects

obj1.my_attr = 'a'
obj2.my_attr = 'b'
obj3.my_attr = 'b'

obj_list = [obj1, obj2, obj3]

And and the end, I want to get [obj1, obj2]. Actually order does not matter, so [obj1, obj3] is exactly as good.

First I thought of the typical imperative clunky ways like following:

record = set()
result = []

for obj in obj_list:
    if obj.my_attr not in record:
        record.add(obj.my_attr)
        result.append(obj)

Then I though of maping it to a dictionary, use the key to override any previous entry and finally extract the values:

result = {obj.my_attr: obj for obj in obj_list}.values() 

This one looks good, but I would like to know if there any more elegant, efficient or functional way of achieving this. Maybe some sweet thing hidden in the standard library... Thanks in advance.

like image 468
bgusach Avatar asked Jul 07 '14 15:07

bgusach


2 Answers

If you want to use a functional programming style in Python, you may want to check out the toolz package. With toolz, you could simply do:

toolz.unique(obj_list, key=lambda x: x.my_attr)

For better performance, you could use operator.attrgetter('my_attr') instead of the lambda function for the key. You could also use cytoolz, which is a fast implementation of toolz written in Cython.

like image 180
eriknw Avatar answered Sep 23 '22 04:09

eriknw


You could use an object that would define a custom __hash__ function:

class HashMyAttr:
    def __init__(self, obj):
        self.obj = obj
    def __hash__(self):
        return self.obj.my_attr.__hash__()
    def __eq__(self, other):
         return self.obj.my_attr == other.obj.my_attr

And use it like:

obj_list = [x.obj for x in set(HashMyAttr(obj) for obj in obj_list)]
like image 24
njzk2 Avatar answered Sep 23 '22 04:09

njzk2