(Already answered by sharth's comment.)
I've written a binary search algorithm in python, that more or less follows the same structure as the bisect_left function found in the bisect module. In fact it has a couple less conditionals as I know that the high point will be the length of the list and the low will be 0. Yet for some reason the built in function runs 5 times as fast as mine.
My code is as follows:
def bisection_search(word, t):
high = len(t)
low = 0
while low < high:
half = (high+low)/2
if t[half] < word:
low = half + 1
else:
high = half
return low
The source code for the built in function is:
def bisect_left(a, x, lo=0, hi=None):
if lo < 0:
raise ValueError('lo must be non-negative')
if hi is None:
hi = len(a)
while lo < hi:
mid = (lo+hi)//2
if a[mid] < x: lo = mid+1
else: hi = mid
return lo
As you can see, virtually identical. However the timed out put for the my function (searching for the last term in an ordered list of 100,000 words) is -3.60012054443e-05, where as the built in achieves -6.91413879395e-06. What explains this difference?
In the source code there is a comment at the end that says "Overwrite above definitions with a fast C implementation" - is this what explains the difference? If so, how would I go about creating such a precompiled module?
Any advice is greatly appreciated.
Binary search is widely used and one of the fastest search algorithms. It works based on the divide and search principle. The data set must be sorted in increasing or decreasing order before providing it as input to the binary search algorithm.
While there's no explicit binary search algorithm in Python, there is a module - bisect - designed to find the insertion point for an element in a sorted list using a binary search.
Binary search is an efficient algorithm for finding an item from a sorted list of items. It works by repeatedly dividing in half the portion of the list that could contain the item, until you've narrowed down the possible locations to just one.
Binary search compares the target value to the middle element of the array. If they are not equal, the half in which the target cannot lie is eliminated, and the search continues on the remaining half, again taking the middle element to compare to the target value and repeating this until the target value is found.
To summarise the remarks above so the question can be closed, the reason the built in module is faster is because the modules are precompiled in c. There are basically two options to attempt to replicate such performance, one is to use a JIT compiler like PyPy where the compilation is done at run time, the other is to compile your own modules in C, using Cython or some other variant to integrate the C code with python. The link from sharth above to the c code for bisect is particularly helpful and can be found here. Thanks again for all the help.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With