Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why bisect slower than sort

I know that bisect is using binary search to keep lists sorted. However I did a timing test that the values are being read and sorted. But, on the contrary to my knowledge, keeping the values and then sorting them win the timing by high difference. Could more experienced users please explain this behavior ? Here is the code I use to test the timings.

import timeit

setup = """
import random
import bisect
a = range(100000)
random.shuffle(a)
"""

p1 = """
b = []
for i in a:
    b.append(i)
b.sort()
"""

p2 = """
b = []
for i in a:
    bisect.insort(b, i)
"""

print timeit.timeit(p1, setup=setup, number = 1)
print timeit.timeit(p2, setup=setup, number = 1)

# 0.0593081859178
# 1.69218442959
# Huge difference ! 35x faster.

In the first process I take values one-by-one instead of just sorting a to obtain a behavior like file reading. And it beats bisect very hard.

like image 405
Max Paython Avatar asked Jul 26 '16 18:07

Max Paython


People also ask

What is the purpose of bisect?

The purpose of Bisect algorithm is to find a position in list where an element needs to be inserted to keep the list sorted. Python in its definition provides the bisect algorithms using the module “bisect” which allows to keep the list in sorted order after the insertion of each element.

What is bisect Insort?

The bisect module in Python assists in preserving a list in a sorted order, as it bypasses the sort operation after each insertion. Insort is one of the functions of the bisect module.

What is Insort?

The insort() method inserts a new element into an already sorted Python list. If the list already has existing elements as the new element then the new element is inserted into the right of the last such existing element. The functions insort() and insort_right() behave the same way.

What is bisect module?

The bisect module ensures that the list remains automatically sorted after insertion. For this purpose, it uses bisection algorithm.


3 Answers

Sorting a list takes about O(N*log(N)) time. Appending N items to a list takes O(N) time. Doing these things consecutively takes about O(N*log(N)) time.

Bisecting a list takes O(log(n)) time. Inserting an item into a list takes O(N) time. Doing both N times inside a for loop takes O(N * (N + log(n))) == O(N^2) time.

O(N^2) is worse than O(N*log(N)), so your p1 is faster than your p2.

like image 60
Kevin Avatar answered Oct 17 '22 10:10

Kevin


Your algorithmic complexity will be worse in the bisect case ...

In the bisect case, you have N operations (each at an average cost of log(N) to find the insertion point and then an additional O(N) step to insert the item). Total complexity: O(N^2).

With sort, you have a single Nlog(N) step (plus N O(1) steps to build the list in the first place). Total complexity: O(Nlog(N))

Also note that sort is implemented in very heavily optimized C code (bisect isn't quite as optimized since it ends up calling various comparison functions much more frequently...)

like image 29
mgilson Avatar answered Oct 17 '22 09:10

mgilson


To understand the time difference, let’s look at what you are actually doing there.

In your first example, you are taking an empty list, and append items to it, and sorting it in the end.

Appending to lists is really cheap, it has an amortized time complexity of O(1). It cannot be really constant time because the underlying data structure, a simple array, eventually needs to be expanded as the list grows. This is done every so often which causes a new array to be allocated and the data being copied. That’s a bit more expensive. But in general, we still say this is O(1).

Next up comes the sorting. Python is using Timsort which is very efficient. This is O(n log n) at average and worst case. So overall, we get constant time following O(n log n) so the sorting is the only thing that matters here. In total, this is pretty simple and very fast.

The second example uses bisect.insort. This utilizes a list and binary search to ensure that the list is sorted at all times.

Essentially, on every insert, it will use binary search to find the correct location to insert the new value, and then shift all items correctly to make room at that index for the new value. Binary search is cheap, O(log n) on average, so this is not a problem. Shifting alone is also not that difficult. In the worst case, we need to move all items one index to the right, so we get O(n) (this is basically the insert operation on lists).

So in total, we would get linear time at worst. However, we do this on every single iteration. So when inserting n elements, we have O(n) each time. This results in a quadratic complexity, O(n²). This is a problem, and will ultimately slow the whole thing down.

So what does this tell us? Sorted inserting into a list to get a sorted result is not really performant. We can use the bisect module to keep an already sorted list ordered when we only do a few operations, but when we actually have unsorted data, it’s easier to sort the data as a whole.

like image 4
poke Avatar answered Oct 17 '22 11:10

poke