Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

bisect.insort complexity not as expected

trying to find the most optimal data structure in python3 for a frotier problem i have to develop have just realised that the complexity of using the module bisect to make a real time ordered insert is not O(nlog n) as it should be and grows exponentially instead. do not know the reasoning of it so felt like asking u guys just in case know something about it since i find it really interesting.

think im using the module right so it shouldn't be a problem on my end, anyways here is the code used to insert node objects determining that insertion by a random f value nodes have.

bisect.insort(self._frontier, (node._f, node))

getting lots of objects in a few seconds, but then not that many over time. Bakuriu suggested me asking this question since he also found it interesting after doing some tests and ending up with same results as me. the code he used to test that out was the following:

python3 -m timeit -s 'import bisect as B; import random as R;seq=[]' 'for _ in range(100000):B.insort(seq, R.randint(0, 1000000))'

An these were his conclusions:

10k insertions is all fine (80ms and up to that point it basically scales linearly [keep in mind that it is O(nlog n) so it's a little bit worse than linear]) but with 100k it takes forever instead of 10 times more. A list of 100k elements isn't really that big and log(100k) is 16 so it's not that big.

any help will be much appreciated!

like image 659
jupcan Avatar asked Oct 27 '18 15:10

jupcan


2 Answers

You probably missed that the time complexity for insort is O(n) and this is documented clearly, for bisect.insort_left():

Keep in mind that the O(log n) search is dominated by the slow O(n) insertion step.

Finding the insertion point is cheap, but inserting into a Python list is not, as the elements past the insertion point have to be moved up a step.

Also see the TimeComplexity page on the Python Wiki, where list insertion is documented:

Insert O(n)

You can find the insertion point in O(log n) time, but the insertion step that follows is O(n), making this a rather expensive way to sort.

If you are using this to sort m elements, you have a O(m^2) (quadratic) solution for what should only take O(m log m) time with TimSort (the sorting algorithm used by the sorted() function).

like image 160
Martijn Pieters Avatar answered Oct 01 '22 04:10

Martijn Pieters


Binary search takes O(log n) comparisons, but insort isn't just a binary search. It also inserts the element, and inserting an element into a length-n list takes O(n) time.

The _frontier naming in your original code snippet suggests some sort of prioritized search algorithm. A heap would probably make more sense for that, or a SortedList from sortedcollections.

like image 41
user2357112 supports Monica Avatar answered Oct 01 '22 06:10

user2357112 supports Monica