Let's say in Python I have a list of files with their respective sizes, represented as a dict (I don't care about the structure, you can propose another one):
from random import randint
def gen_rand_fileslist(nbfiles=100, maxvalue=100):
fileslist = {}
for i in xrange(nbfiles):
fileslist["file_"+str(i)] = randint(1, maxvalue)
return fileslist
fileslist = gen_rand_fileslist(10)
Example fileslist:
{'file_0': 2,
'file_1': 21,
'file_2': 20,
'file_3': 16,
'file_4': 12,
'file_5': 67,
'file_6': 95,
'file_7': 16,
'file_8': 2,
'file_9': 5}
Now I want to quickly find the highest value below the specified threshold. For example:
get_value_below(fileslist, threshold=25) # result should be 'file_1' with value 21
The function get_value_below() is to be called in a tight loop, so it should be as fast as possible, and any threshold can be specified (so sorting doesn't help directly).
Is there a way to be faster than just walking through the whole list (linear time)?
It all depends on how often you are going to search for a threshold in the fileslist. If you are going to do more than Θ(log n) queries, then it's better to sort first and then perform a binary search for each query. Otherwise, if you want to perform one query only, then yes it's better to linear search since the element you want can be virtually anywhere and you'll definitely need to visit each element of the list.
If you are planning to use the sorting first and binary searching then, then use bisect_right which for an input x, it will return the position in the list that contains the biggest element lower or equal to x.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With