I did an experiment in which I tried to find the time it takes to search a python list. I have a list arr with random integers. arr_s has the same elements only sorted.
arr = np.random.randint(low = 0, high = 1000, size = 500)
arr_s = sorted(arr)
Now I create a random array of integers find which has elements that I want to search in arr and arr_s.
>>> %%timeit
...:find = np.random.randint(0, 1000, 600)
...:for i in find:
...:    if i in arr:
...:        continue
[OUT]:100 loops, best of 3: 2.18 ms per loop
>>> %%timeit
...:find = np.random.randint(0, 1000, 600)
...:for i in find:
...:    if i in arr_s:
...:        continue
[OUT]:100 loops, best of 3: 5.15 ms per loop
Now I understand that I have not used any specific method to search in the sorted array (e.g. binary search). So it might be doing the standard linear search but why does it take significantly longer to search in the sorted array that in the unsorted array? I would think that it should take nearly the same time. I have tried all sorts of find array. Arrays which have integers from (0, 1000), (-1000, -100) and (-10000, 10000) the loops always take longer for the sorted array.
arr = np.random.randint(low = 0, high = 1000, size = 500)
arr_s = sorted(arr)
arr is an array. arr_s is a list. Searching an array can be handled efficiently by numpy, but searching a list requires following pointers and performing type checks. It has nothing to do with sorting.
Note: in does weird things in numpy. Using in with numpy ndarrays may be a bad idea.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With