Python filtering list of objects by distinct attribute

Question

I have a list of objects with multiple attributes. I want to filter the list based on one attribute of the object (country_code), i.e.

Current list

elems = [{'region_code': 'EUD', 'country_code': 'ROM', 'country_desc': 'Romania', 'event_number': '6880'}, 
{'region_code': 'EUD', 'country_code': 'ROM', 'country_desc':'Romania', 'event_number': '3200'}, 
{'region_code': 'EUD', 'country_code': 'ROM', 'country_desc': 'Romania', 'event_number': '4000'}, 
{'region_code': 'EUD', 'country_code': 'SVN', 'country_desc': 'Slovenia', 'event_number': '6880'}, 
{'region_code': 'EUD', 'country_code': 'NLD', 'country_desc':'Netherlands', 'event_number': '6880'}, 
{'region_code': 'EUD', 'country_code': 'BEL', 'country_desc':'Belgium', 'event_number': '6880'}]

Desired list

elems = [{'region_code': 'EUD', 'country_code': 'ROM', 'country_desc': 'Romania', 'event_number': '6880'}, 
{'region_code': 'EUD', 'country_code': 'SVN', 'country_desc': 'Slovenia', 'event_number': '6880'}, 
{'region_code': 'EUD', 'country_code': 'NLD', 'country_desc': 'Netherlands', 'event_number': '6880'}, 
{'region_code': 'EUD', 'country_code': 'BEL', 'country_desc': 'Belgium', 'event_number': '6880'}]

I can achieve this by creating a dictionary and a for-loop, but I feel like there's an easier way in python using the filter() or reduce() functions, I just can't figure out how.

Can anyone simplify the below code using in-built python functions? Performance is a big factor because the real data will be substantial.

Working code:

unique = {}
for elem in elems:
  if elem['country_code'] not in unique.keys():
     unique[elem['country_code']] = elem

print(unique.values())

Worth noting I have also tried the code below, but it performs worse than the current working code:

unique = []
for elem in elems:
    if not any(u['country_code'] == elem['country_code'] for u in unique):
        unique.append(elem)

tobias_k · Accepted Answer

I think your first approach is already pretty close to being optimal. Dictionary lookup is fast (just as fast as in a set) and the loop is easy to understand, even though a bit lengthy (by Python standards), but you should not sacrifice readability for brevity.

You can, however, shave off one line using setdefault, and you might want to use collections.OrderedDict() so that the elements in the resulting list are in their orginal order. Also, note that in Python 3, unique.values() is not a list but a view on the dict.

unique = collections.OrderedDict()
for elem in elems:
    unique.setdefault(elem["country_code"], elem)

If you really, really want to use reduce, you can use the empty dict as an initializer and then use d.setdefault(k,v) and d to set the value (if not present) and return the modified dict.

unique = reduce(lambda unique, elem: unique.setdefault(elem["country_code"], elem) and unique,
                elems, collections.OrderedDict())

I would just use the loop, though.

hilberts_drinking_problem · Answer

I think that your approach is just fine. It would be slightly better to check elem['country_code'] not in unique instead of elem['country_code'] not in unique.keys().

However, here is another way to do it with a list comprehension:

visited = set()
res = [e for e in elems
        if e['country_code'] not in visited
        and not visited.add(e['country_code'])]

The last bit abuses the fact that not None == True and list.add returns None.

Python filtering list of objects by distinct attribute

Tags:

python

dictionary

filtering

Naadof

2 Answers

tobias_k

hilberts_drinking_problem

Recent Activity

Donate For Us

Python filtering list of objects by distinct attribute

Tags:

python

dictionary

filtering

Naadof

2 Answers

tobias_k

hilberts_drinking_problem

Related questions

Recent Activity

Donate For Us