I am trying to set a value from the dict if the condition is met. Basically I iterate over the values of my dictionary and check if they fit my condition (then I break to loop which is not the best practice but saves some iterations)
Here is the code I am using:
for (key,value) in zip(mn_zone_dict.keys(), mn_zone_dict.values()):
if cost < value:
zone = key
break
I does it's job but it is relatively slow while I have to check > 10k records so I am looking for some smarter (and maybe more pythonic) way of solving that task. I have seen a function any() but it only returns if there is such entry which matches the conditions without telling which.
I'd be happy to hear your ideas and suggestions.
If you have the data directly as-is, with only a dictionary structure, you will have to iterate over it every time. The best speedup you can get would be to use a comprehension instead of a loop, and dict.items
instead of zip
:
zones = [k for k, v in my_zone_dict.items() if cost < v]
On the one hand this iterates over the whole dict. On the other, it tells you immediately how many values met the criterion, if any.
The problem here is that no matter how much less overhead a comprehension has than an explicit loop, this is still O(n)
for every lookup. The correct solution is to use a different, or complimentary data structure. Since you want value
to be greater than something, I'd recommend a max-heap.
Python implements heaps in the heapq
module. It is a bit unusual because it does not provide a heap object, just functions to heapify and maintain lists as heaps. Also, only min-heaps are supported, but that's OK because you can always just negate your values:
my_zone_list = [(-v, k) for k, v in my_zone_dict.items()]
heapq.heapify(my_zone_list)
This is a one-time O(n)
penalty, which you never have to repeat. Your whole loop now becomes an O(1)
operation:
if cost < -my_zone_list[0][0]:
zone = my_zone_list[0][1]
Inserting new elements has an O(log(n))
cost:
heapq.heappush(my_zone_list, (-new_value, new_key))
As a side note, if you can't introduce new data structures, you might get better performance with
v, zone = max((v, k) for k, v in my_zone_dict.items())
if cost < v: ...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With