Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find value in the dictionary that matches the condition

I am trying to set a value from the dict if the condition is met. Basically I iterate over the values of my dictionary and check if they fit my condition (then I break to loop which is not the best practice but saves some iterations)

Here is the code I am using:

for (key,value) in zip(mn_zone_dict.keys(), mn_zone_dict.values()):
    if cost < value:
        zone = key
        break

I does it's job but it is relatively slow while I have to check > 10k records so I am looking for some smarter (and maybe more pythonic) way of solving that task. I have seen a function any() but it only returns if there is such entry which matches the conditions without telling which.

I'd be happy to hear your ideas and suggestions.

like image 643
hikamare Avatar asked Nov 26 '22 01:11

hikamare


1 Answers

If you have the data directly as-is, with only a dictionary structure, you will have to iterate over it every time. The best speedup you can get would be to use a comprehension instead of a loop, and dict.items instead of zip:

zones = [k for k, v in my_zone_dict.items() if cost < v]

On the one hand this iterates over the whole dict. On the other, it tells you immediately how many values met the criterion, if any.

The problem here is that no matter how much less overhead a comprehension has than an explicit loop, this is still O(n) for every lookup. The correct solution is to use a different, or complimentary data structure. Since you want value to be greater than something, I'd recommend a max-heap.

Python implements heaps in the heapq module. It is a bit unusual because it does not provide a heap object, just functions to heapify and maintain lists as heaps. Also, only min-heaps are supported, but that's OK because you can always just negate your values:

my_zone_list = [(-v, k) for k, v in my_zone_dict.items()]
heapq.heapify(my_zone_list)

This is a one-time O(n) penalty, which you never have to repeat. Your whole loop now becomes an O(1) operation:

if cost < -my_zone_list[0][0]:
    zone = my_zone_list[0][1]

Inserting new elements has an O(log(n)) cost:

heapq.heappush(my_zone_list, (-new_value, new_key))

As a side note, if you can't introduce new data structures, you might get better performance with

v, zone = max((v, k) for k, v in my_zone_dict.items())
if cost < v: ...
like image 51
Mad Physicist Avatar answered Jan 18 '23 14:01

Mad Physicist