Why bisect slower than sort

Tags:

I know that bisect is using binary search to keep lists sorted. However I did a timing test that the values are being read and sorted. But, on the contrary to my knowledge, keeping the values and then sorting them win the timing by high difference. Could more experienced users please explain this behavior ? Here is the code I use to test the timings.

import timeit

setup = """
import random
import bisect
a = range(100000)
random.shuffle(a)
"""

p1 = """
b = []
for i in a:
    b.append(i)
b.sort()
"""

p2 = """
b = []
for i in a:
    bisect.insort(b, i)
"""

print timeit.timeit(p1, setup=setup, number = 1)
print timeit.timeit(p2, setup=setup, number = 1)

# 0.0593081859178
# 1.69218442959
# Huge difference ! 35x faster.

In the first process I take values one-by-one instead of just sorting a to obtain a behavior like file reading. And it beats bisect very hard.

405

asked Jul 26 '16 18:07

Max Paython

3 Answers

Sorting a list takes about O(N*log(N)) time. Appending N items to a list takes O(N) time. Doing these things consecutively takes about O(N*log(N)) time.

Bisecting a list takes O(log(n)) time. Inserting an item into a list takes O(N) time. Doing both N times inside a for loop takes O(N * (N + log(n))) == O(N^2) time.

O(N^2) is worse than O(N*log(N)), so your p1 is faster than your p2.

answered Oct 17 '22 10:10

Kevin

Your algorithmic complexity will be worse in the bisect case ...

In the bisect case, you have N operations (each at an average cost of log(N) to find the insertion point and then an additional O(N) step to insert the item). Total complexity: O(N^2).

With sort, you have a single Nlog(N) step (plus N O(1) steps to build the list in the first place). Total complexity: O(Nlog(N))

Also note that sort is implemented in very heavily optimized C code (bisect isn't quite as optimized since it ends up calling various comparison functions much more frequently...)

answered Oct 17 '22 09:10

mgilson

To understand the time difference, let’s look at what you are actually doing there.

In your first example, you are taking an empty list, and append items to it, and sorting it in the end.

Appending to lists is really cheap, it has an amortized time complexity of O(1). It cannot be really constant time because the underlying data structure, a simple array, eventually needs to be expanded as the list grows. This is done every so often which causes a new array to be allocated and the data being copied. That’s a bit more expensive. But in general, we still say this is O(1).

Next up comes the sorting. Python is using Timsort which is very efficient. This is O(n log n) at average and worst case. So overall, we get constant time following O(n log n) so the sorting is the only thing that matters here. In total, this is pretty simple and very fast.

The second example uses bisect.insort. This utilizes a list and binary search to ensure that the list is sorted at all times.

Essentially, on every insert, it will use binary search to find the correct location to insert the new value, and then shift all items correctly to make room at that index for the new value. Binary search is cheap, O(log n) on average, so this is not a problem. Shifting alone is also not that difficult. In the worst case, we need to move all items one index to the right, so we get O(n) (this is basically the insert operation on lists).

So in total, we would get linear time at worst. However, we do this on every single iteration. So when inserting n elements, we have O(n) each time. This results in a quadratic complexity, O(n²). This is a problem, and will ultimately slow the whole thing down.

So what does this tell us? Sorted inserting into a list to get a sorted result is not really performant. We can use the bisect module to keep an already sorted list ordered when we only do a few operations, but when we actually have unsorted data, it’s easier to sort the data as a whole.

answered Oct 17 '22 11:10

poke

Related questions
                            
                                Column filtering in PySpark
                            
                                How to merge two databases of mongoDB present in two different computers?
                            
                                Python logging won't set a logging level using basicConfig
                            
                                Python Paramiko send CTRL+C to an ssh shell
                            
                                From JPG to b64encode to cv2.imread()
                            
                                Python library for data scaling, centering and Box-Cox transformation
                            
                                Generate x+xx+xxx+xxxx ... for a given integer (for 4 -> 4+44+444...)
                            
                                AttributeError: 'RegexURLPattern' object has no attribute '_callback'
                            
                                How can I use versioning in S3 with boto3?
                            
                                Can I dynamically pass a list of arguments to a python function with predefined arguments? [duplicate]
                            
                                Django Allauth Error: "'socialaccount' is not a registered tag library"
                            
                                Matplotlib animation inside your own GUI
                            
                                Easiest way to generate random int64 array in numpy?
                            
                                Setting kivy window size not working
                            
                                TypeError: 'numpy.float64' object is not callable?
                            
                                How to get Windows window names with ctypes in python
                            
                                No module named bidi.algorithm
                            
                                Hard limiting / threshold activation function in TensorFlow
                            
                                python lambda list filtering with multiple conditions
                            
                                Add/subtract dataframes with different column labels

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why bisect slower than sort

Tags:

performance

python

algorithm

sorting