Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why search in sorted list in python takes longer?

I did an experiment in which I tried to find the time it takes to search a python list. I have a list arr with random integers. arr_s has the same elements only sorted.

arr = np.random.randint(low = 0, high = 1000, size = 500)
arr_s = sorted(arr)

Now I create a random array of integers find which has elements that I want to search in arr and arr_s.

>>> %%timeit
...:find = np.random.randint(0, 1000, 600)
...:for i in find:
...:    if i in arr:
...:        continue

[OUT]:100 loops, best of 3: 2.18 ms per loop


>>> %%timeit
...:find = np.random.randint(0, 1000, 600)
...:for i in find:
...:    if i in arr_s:
...:        continue

[OUT]:100 loops, best of 3: 5.15 ms per loop

Now I understand that I have not used any specific method to search in the sorted array (e.g. binary search). So it might be doing the standard linear search but why does it take significantly longer to search in the sorted array that in the unsorted array? I would think that it should take nearly the same time. I have tried all sorts of find array. Arrays which have integers from (0, 1000), (-1000, -100) and (-10000, 10000) the loops always take longer for the sorted array.

like image 308
Shishir Pandey Avatar asked Sep 05 '13 18:09

Shishir Pandey


1 Answers

arr = np.random.randint(low = 0, high = 1000, size = 500)
arr_s = sorted(arr)

arr is an array. arr_s is a list. Searching an array can be handled efficiently by numpy, but searching a list requires following pointers and performing type checks. It has nothing to do with sorting.

Note: in does weird things in numpy. Using in with numpy ndarrays may be a bad idea.

like image 56
user2357112 supports Monica Avatar answered Oct 19 '22 11:10

user2357112 supports Monica