What would be the fastest way to search for a number (eg. 12.31) in long sorted list and get the values one before and after my "search" value when the exact value isn't found (eg. 11.12 and 12.03 in the list below)?
Many thanks in advance.
long_list = [10.11, 11.12, 13.03, 14.2 .. 12345.67]
The idea is to find the pivot point, divide the array into two sub-arrays and perform a binary search. For a sorted (in increasing order) and rotated array, the pivot element is the only element for which the next element to it is smaller than it. Using binary search based on the above idea, pivot can be found.
Binary search is an efficient algorithm for finding an item from a sorted list of items. It works by repeatedly dividing in half the portion of the list that could contain the item, until you've narrowed down the possible locations to just one.
In computer science, binary search, also known as half-interval search, logarithmic search, or binary chop, is a search algorithm that finds the position of a target value within a sorted array. Binary search compares the target value to the middle element of the array.
Elements within a sorted array are found using a binary search, in O(log n); thus sorted arrays are suited for cases when one needs to be able to look up elements quickly, e.g. as a set or multiset data structure. This complexity for lookups is the same as for self-balancing binary search trees.
The fastest is probably to use built-in support in python. Here I'm thinking about the bisect module. Below I'm using a dictionary to quickly check in O(1) if a value is in the list; if not, bisect
is used to find values smaller than and larger than the sought value.
#!/usr/bin/env python
import bisect
def find_lt(a, x):
'Find rightmost value less than x'
i = bisect.bisect_left(a, x)
if i:
return a[i-1]
raise ValueError
def find_gt(a, x):
'Find leftmost value greater than x'
i = bisect.bisect_right(a, x)
if i != len(a):
return a[i]
raise ValueError
# First create a test-list (49996 items)
i=1.0
R=[1.0]
D={}
while i < 10000:
i+=0.2
i=round(i,2)
D[i]=True
R.append(i)
# Locate a value, in this case 100.3 which is not in the list
x=100.3
if D.has_key(x):
print "found", x
else:
print find_lt(R, x)
print find_gt(R, x)
Output for x=100.3
:
100.2
100.4
Exponential search (AKA galloping search) would perform better than plain binary search if the list is very long. The idea is to scan forward from position 0 on increasing steps until the answer is passed at this point a binary search can be performed to the range formed by the last two steps. If the element is not found then the last attempt will point to the closest elements.
Have a look at Basic Techniques for information retrieval. The pseudo-code algorithm is provided and they discuss its complexity against binary search.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With