I know that bisect is using binary search to keep lists sorted. However I did a timing test that the values are being read and sorted. But, on the contrary to my knowledge, keeping the values and then sorting them win the timing by high difference. Could more experienced users please explain this behavior ? Here is the code I use to test the timings.
import timeit
setup = """
import random
import bisect
a = range(100000)
random.shuffle(a)
"""
p1 = """
b = []
for i in a:
b.append(i)
b.sort()
"""
p2 = """
b = []
for i in a:
bisect.insort(b, i)
"""
print timeit.timeit(p1, setup=setup, number = 1)
print timeit.timeit(p2, setup=setup, number = 1)
# 0.0593081859178
# 1.69218442959
# Huge difference ! 35x faster.
In the first process I take values one-by-one instead of just sorting a
to obtain a behavior like file reading. And it beats bisect very hard.
The purpose of Bisect algorithm is to find a position in list where an element needs to be inserted to keep the list sorted. Python in its definition provides the bisect algorithms using the module “bisect” which allows to keep the list in sorted order after the insertion of each element.
The bisect module in Python assists in preserving a list in a sorted order, as it bypasses the sort operation after each insertion. Insort is one of the functions of the bisect module.
The insort() method inserts a new element into an already sorted Python list. If the list already has existing elements as the new element then the new element is inserted into the right of the last such existing element. The functions insort() and insort_right() behave the same way.
The bisect module ensures that the list remains automatically sorted after insertion. For this purpose, it uses bisection algorithm.
Sorting a list takes about O(N*log(N))
time. Appending N items to a list takes O(N)
time. Doing these things consecutively takes about O(N*log(N))
time.
Bisecting a list takes O(log(n))
time. Inserting an item into a list takes O(N)
time. Doing both N times inside a for loop takes O(N * (N + log(n))) == O(N^2)
time.
O(N^2)
is worse than O(N*log(N))
, so your p1
is faster than your p2
.
Your algorithmic complexity will be worse in the bisect case ...
In the bisect
case, you have N
operations (each at an average cost of log(N)
to find the insertion point and then an additional O(N)
step to insert the item). Total complexity: O(N^2)
.
With sort
, you have a single Nlog(N)
step (plus N
O(1)
steps to build the list in the first place). Total complexity: O(Nlog(N))
Also note that sort
is implemented in very heavily optimized C code (bisect
isn't quite as optimized since it ends up calling various comparison functions much more frequently...)
To understand the time difference, let’s look at what you are actually doing there.
In your first example, you are taking an empty list, and append items to it, and sorting it in the end.
Appending to lists is really cheap, it has an amortized time complexity of O(1). It cannot be really constant time because the underlying data structure, a simple array, eventually needs to be expanded as the list grows. This is done every so often which causes a new array to be allocated and the data being copied. That’s a bit more expensive. But in general, we still say this is O(1).
Next up comes the sorting. Python is using Timsort which is very efficient. This is O(n log n) at average and worst case. So overall, we get constant time following O(n log n)
so the sorting is the only thing that matters here. In total, this is pretty simple and very fast.
The second example uses bisect.insort
. This utilizes a list and binary search to ensure that the list is sorted at all times.
Essentially, on every insert, it will use binary search to find the correct location to insert the new value, and then shift all items correctly to make room at that index for the new value. Binary search is cheap, O(log n) on average, so this is not a problem. Shifting alone is also not that difficult. In the worst case, we need to move all items one index to the right, so we get O(n) (this is basically the insert operation on lists).
So in total, we would get linear time at worst. However, we do this on every single iteration. So when inserting n
elements, we have O(n) each time. This results in a quadratic complexity, O(n²). This is a problem, and will ultimately slow the whole thing down.
So what does this tell us? Sorted inserting into a list to get a sorted result is not really performant. We can use the bisect
module to keep an already sorted list ordered when we only do a few operations, but when we actually have unsorted data, it’s easier to sort the data as a whole.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With