What is Big-O complexity of random.choice(list) in Python3, where n is amount of elements in a list?
Edit: Thank You all for give me the answer, now I understand.
The complexity of random. choice(list) is O(log n) where n is the number of elements in the list. The cpython implementation uses _randbelow(len(seq)) to get a pseudo-random index and then returns the item at that index.
Time complexity is commonly estimated by counting the number of elementary operations performed by the algorithm, supposing that each elementary operation takes a fixed amount of time to perform. When analyzing the time complexity of an algorithm we may find three cases: best-case, average-case and worst-case.
Here is the summary for in : list - Average: O(n) set/dict - Average: O(1), Worst: O(n)
O(1)
. Or to be more precise, it's equivalent to the big-O random access time for looking up a single index in whatever sequence you pass it, and list
has O(1)
random access indexing (as does tuple
). Simplified, all it does is seq[random.randrange(len(seq))]
, which is obviously equivalent to a single index lookup operation.
An example where it would be O(n)
is collections.deque
, where indexing in the middle of the deque
is O(n)
(with a largish constant divisor though, so it's not that expensive unless the deque
is reaching the thousands of elements range or higher). So basically, don't use a deque
if it's going to be large and you plan to select random elements from it repeatedly, stick to list
, tuple
, str
, byte
/bytearray
, array.array
and other sequence types with O(1)
indexing.
Though the question is about random.choice
and previous answers on it have several explanations, when I searched for the complexity of np.random.choice
, I didn't find an answer, so I decide to explain about np.random.choice
.
choice(a, size=None, replace=True, p=None). Assume a.shape=(n,)
and size=m
.
When with replacement:
The complexity for np.random.choice
is O(m) if p
is not specified (assuming it as uniform distribution), and is O(n + n log m ) if p
is specified.
The github code can be find here np.random.choice.
When p
is not specified, choice
generates an index array by randint
and returns a[index]
, so the complexity is O(m). (I assume the operation of generating a random integer by randint is O(1).)
When p
is specified, the function first computes prefix sum of p
. Then it draws m samples from [0, 1), followed by using binary search to find a corresponding interval in the prefix sum for each drawn sample. The evidence to use binary search can be found here. So this process is O(n + m log n). If you need a faster method in this situation, you can use Alias Method, which needs O(n) time for preprocessing and O(m) time to sample m items.
When without replacement: (It's kind of complicated, and maybe I'll finish it in the future.)
If p
is not specified, the complexity is the same as np.permutation(n)
, even when m is only 1. See more here.
If p
is specified, the expected complexity is at least $n \log n \log\frac{n}{n + 1 - m}$. (This is an upperbound, but not tight.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With