Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python filtering list of objects by distinct attribute

I have a list of objects with multiple attributes. I want to filter the list based on one attribute of the object (country_code), i.e.

Current list

elems = [{'region_code': 'EUD', 'country_code': 'ROM', 'country_desc': 'Romania', 'event_number': '6880'}, 
{'region_code': 'EUD', 'country_code': 'ROM', 'country_desc':'Romania', 'event_number': '3200'}, 
{'region_code': 'EUD', 'country_code': 'ROM', 'country_desc': 'Romania', 'event_number': '4000'}, 
{'region_code': 'EUD', 'country_code': 'SVN', 'country_desc': 'Slovenia', 'event_number': '6880'}, 
{'region_code': 'EUD', 'country_code': 'NLD', 'country_desc':'Netherlands', 'event_number': '6880'}, 
{'region_code': 'EUD', 'country_code': 'BEL', 'country_desc':'Belgium', 'event_number': '6880'}]

Desired list

elems = [{'region_code': 'EUD', 'country_code': 'ROM', 'country_desc': 'Romania', 'event_number': '6880'}, 
{'region_code': 'EUD', 'country_code': 'SVN', 'country_desc': 'Slovenia', 'event_number': '6880'}, 
{'region_code': 'EUD', 'country_code': 'NLD', 'country_desc': 'Netherlands', 'event_number': '6880'}, 
{'region_code': 'EUD', 'country_code': 'BEL', 'country_desc': 'Belgium', 'event_number': '6880'}]

I can achieve this by creating a dictionary and a for-loop, but I feel like there's an easier way in python using the filter() or reduce() functions, I just can't figure out how.

Can anyone simplify the below code using in-built python functions? Performance is a big factor because the real data will be substantial.

Working code:

unique = {}
for elem in elems:
  if elem['country_code'] not in unique.keys():
     unique[elem['country_code']] = elem

print(unique.values())

Worth noting I have also tried the code below, but it performs worse than the current working code:

unique = []
for elem in elems:
    if not any(u['country_code'] == elem['country_code'] for u in unique):
        unique.append(elem)
like image 282
Naadof Avatar asked Oct 14 '25 14:10

Naadof


2 Answers

I think your first approach is already pretty close to being optimal. Dictionary lookup is fast (just as fast as in a set) and the loop is easy to understand, even though a bit lengthy (by Python standards), but you should not sacrifice readability for brevity.

You can, however, shave off one line using setdefault, and you might want to use collections.OrderedDict() so that the elements in the resulting list are in their orginal order. Also, note that in Python 3, unique.values() is not a list but a view on the dict.

unique = collections.OrderedDict()
for elem in elems:
    unique.setdefault(elem["country_code"], elem)

If you really, really want to use reduce, you can use the empty dict as an initializer and then use d.setdefault(k,v) and d to set the value (if not present) and return the modified dict.

unique = reduce(lambda unique, elem: unique.setdefault(elem["country_code"], elem) and unique,
                elems, collections.OrderedDict())

I would just use the loop, though.

like image 115
tobias_k Avatar answered Oct 18 '25 12:10

tobias_k


I think that your approach is just fine. It would be slightly better to check elem['country_code'] not in unique instead of elem['country_code'] not in unique.keys().

However, here is another way to do it with a list comprehension:

visited = set()
res = [e for e in elems
        if e['country_code'] not in visited
        and not visited.add(e['country_code'])]

The last bit abuses the fact that not None == True and list.add returns None.

like image 33
hilberts_drinking_problem Avatar answered Oct 18 '25 11:10

hilberts_drinking_problem



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!