Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python - efficiently find where something would land in a sorted list?

Tags:

python

sorting

I have a list:

x = ['c', 'a', 'e']

I can sort this list:

x_sorted = sorted(x)

x_sorted is now ['a', 'c', 'e']

Now let's say I have a new variable y = 'd'

I want to find out where in x_sorted this new variable would fall. In this example the new variable y contains the string 'd' so it would be placed as ['a', 'c', 'd', 'e'] in the index 2 of the list. I desire to find out this index number as efficiently as possible (since I have to repeat this process many times).

Here is a function I wrote which does the task very simply:

def f(x_sorted, y):
    new_list = x_sorted[:] + [y]
    return sorted(new_list).index(y)

This gives me the correct answer.

I am wondering if there is a better more efficient way of doing this, as f will be called 100,000+ times.

Thanks in advance!

like image 614
applecider Avatar asked Dec 20 '22 04:12

applecider


2 Answers

You can use bisect

from bisect import bisect

l = ['a', 'c', 'e']

print(bisect(l,"d"))
2

To add it to the list:

from bisect import insort


l = ['a',"b", 'c', 'e']

insort(l, "d")
print(l)
insort(l, "f")
print(l)

['a', 'b', 'c', 'd', 'e']
['a', 'b', 'c', 'd', 'e', 'f']

If you want a faster insert you could use a blist where maintaining a sorted list with insort is:

O(log**2 n)  vs  O(n)

from bisect import insort

from blist import blist

b = blist(["a", "b", "c", "e"])
insort(b, "f")
insort(b, "d")
print(b)
blist(['a', 'b', 'c', 'd', 'e', 'f'])

There is also a blist.sortedlist list where you can use .add:

from blist import sortedlist

l = ['b',"a", 'c', 'e']
b = sortedlist(l)

b.add("f")
print(b)
sortedlist(['a', 'b', 'c', 'e', 'f'])

There is also a sortedcontainers library that has a sortedlist implementation.

like image 173
Padraic Cunningham Avatar answered Dec 21 '22 18:12

Padraic Cunningham


If x doesn't change, or changes infrequently, you can pre-sort it and then use binary search on the sorted list. This will incur O(n logn) cost for every sort plus O(logn) for every subsequent lookup.

If x changes a lot, you could use linear search:

>>> x = ['c', 'a', 'e']
>>> y = 'd'
>>> sum(y > el for el in x)
2

This has O(n) lookup complexity.

like image 34
NPE Avatar answered Dec 21 '22 16:12

NPE