I did an experiment in which I tried to find the time it takes to search a python list. I have a list arr
with random integers. arr_s
has the same elements only sorted.
arr = np.random.randint(low = 0, high = 1000, size = 500)
arr_s = sorted(arr)
Now I create a random array of integers find
which has elements that I want to search in arr
and arr_s
.
>>> %%timeit
...:find = np.random.randint(0, 1000, 600)
...:for i in find:
...: if i in arr:
...: continue
[OUT]:100 loops, best of 3: 2.18 ms per loop
>>> %%timeit
...:find = np.random.randint(0, 1000, 600)
...:for i in find:
...: if i in arr_s:
...: continue
[OUT]:100 loops, best of 3: 5.15 ms per loop
Now I understand that I have not used any specific method to search in the sorted array (e.g. binary search). So it might be doing the standard linear search but why does it take significantly longer to search in the sorted array that in the unsorted array? I would think that it should take nearly the same time. I have tried all sorts of find
array. Arrays which have integers from (0, 1000), (-1000, -100) and (-10000, 10000) the loops always take longer for the sorted array.
arr = np.random.randint(low = 0, high = 1000, size = 500)
arr_s = sorted(arr)
arr
is an array. arr_s
is a list. Searching an array can be handled efficiently by numpy, but searching a list requires following pointers and performing type checks. It has nothing to do with sorting.
Note: in
does weird things in numpy. Using in
with numpy ndarrays may be a bad idea.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With