What is Big-O complexity of random.choice(list) in Python3, where n is amount of elements in a list? Edit: Thank You all for give me the answer, now I understand.

<code>O(1)</code>. Or to be more precise, it's equivalent to the big-O random access time for looking up a single index in whatever sequence you pass it, and <code>list</code> has <code>O(1)</code> random access indexing (as does <code>tuple</code>). Simplified, all it does is <code>seq[random.randrange(len(seq))]</code>, which is obviously equivalent to a single index lookup operation. An example where it would be <code>O(n)</code> is <code>collections.deque</code>, where indexing in the middle of the <code>deque</code> is <code>O(n)</code> (with a largish constant divisor though, so it's not that expensive unless the <code>deque</code> is reaching the thousands of elements range or higher). So basically, don't use a <code>deque</code> if it's going to be large and you plan to select random elements from it repeatedly, stick to <code>list</code>, <code>tuple</code>, <code>str</code>, <code>byte</code>/<code>bytearray</code>, <code>array.array</code> and other sequence types with <code>O(1)</code> indexing.

Though the question is about <code>random.choice</code> and previous answers on it have several explanations, when I searched for the complexity of <code>np.random.choice</code>, I didn't find an answer, so I decide to explain about <code>np.random.choice</code>. choice(a, size=None, replace=True, p=None). Assume <code>a.shape=(n,)</code> and <code>size=m</code>. When with replacement: The complexity for <code>np.random.choice</code> is O(m) if <code>p</code> is not specified (assuming it as uniform distribution), and is O(n + n log m ) if <code>p</code> is specified. The github code can be find here np.random.choice. When <code>p</code> is not specified, <code>choice</code> generates an index array by <code>randint</code> and returns <code>a[index]</code>, so the complexity is O(m). (I assume the operation of generating a random integer by randint is O(1).) When <code>p</code> is specified, the function first computes prefix sum of <code>p</code>. Then it draws m samples from [0, 1), followed by using binary search to find a corresponding interval in the prefix sum for each drawn sample. The evidence to use binary search can be found here. So this process is O(n + m log n). If you need a faster method in this situation, you can use Alias Method, which needs O(n) time for preprocessing and O(m) time to sample m items. <hr> When without replacement: (It's kind of complicated, and maybe I'll finish it in the future.) If <code>p</code> is not specified, the complexity is the same as <code>np.permutation(n)</code>, even when m is only 1. See more here. If <code>p</code> is specified, the expected complexity is at least $n \log n \log\frac{n}{n + 1 - m}$. (This is an upperbound, but not tight.)

Big-O complexity of random.choice(list) in Python3

2 Answers

O(1). Or to be more precise, it's equivalent to the big-O random access time for looking up a single index in whatever sequence you pass it, and list has O(1) random access indexing (as does tuple). Simplified, all it does is seq[random.randrange(len(seq))], which is obviously equivalent to a single index lookup operation.

An example where it would be O(n) is collections.deque, where indexing in the middle of the deque is O(n) (with a largish constant divisor though, so it's not that expensive unless the deque is reaching the thousands of elements range or higher). So basically, don't use a deque if it's going to be large and you plan to select random elements from it repeatedly, stick to list, tuple, str, byte/bytearray, array.array and other sequence types with O(1) indexing.

174

answered Sep 20 '22 04:09

ShadowRanger

Though the question is about random.choice and previous answers on it have several explanations, when I searched for the complexity of np.random.choice, I didn't find an answer, so I decide to explain about np.random.choice.

choice(a, size=None, replace=True, p=None). Assume a.shape=(n,) and size=m.

When with replacement:

The complexity for np.random.choice is O(m) if p is not specified (assuming it as uniform distribution), and is O(n + n log m ) if p is specified.

The github code can be find here np.random.choice.

When p is not specified, choice generates an index array by randint and returns a[index], so the complexity is O(m). (I assume the operation of generating a random integer by randint is O(1).)

When p is specified, the function first computes prefix sum of p. Then it draws m samples from [0, 1), followed by using binary search to find a corresponding interval in the prefix sum for each drawn sample. The evidence to use binary search can be found here. So this process is O(n + m log n). If you need a faster method in this situation, you can use Alias Method, which needs O(n) time for preprocessing and O(m) time to sample m items.

When without replacement: (It's kind of complicated, and maybe I'll finish it in the future.)

If p is not specified, the complexity is the same as np.permutation(n), even when m is only 1. See more here.

If p is specified, the expected complexity is at least $n \log n \log\frac{n}{n + 1 - m}$. (This is an upperbound, but not tight.)

answered Sep 20 '22 04:09

Muzhi

Related questions
                            
                                Python dump json with accents [duplicate]
                            
                                Automatically strip() all values in WTForms?
                            
                                How can I decorate a Python unittest method to skip if a property I've previously evaluated isn't True?
                            
                                Elegant way to make logging.LoggerAdapter available to other modules
                            
                                How to better fit seaborn violinplots?
                            
                                How to get binary representation of negative numbers in python [duplicate]
                            
                                How we can use iter_rows() in Python openpyxl package?
                            
                                Verify the error code or message from SystemExit in pytest
                            
                                Python change Accept-Language using requests
                            
                                'Permission denied' error when using pip install in virtualenv
                            
                                modifying python bytecode
                            
                                Python iterate over stdin line by line using input()
                            
                                Bulk update with Python's elasticsearch client
                            
                                Using a .bat to change directories and run Jupyter
                            
                                Multi POST query (session mode)
                            
                                Run same python code in two terminals, will them interfere each other?
                            
                                How can I specify a python version using setuptools? [duplicate]
                            
                                How can I extract address from raw text using NLTK in python?
                            
                                How to add k-means predicted clusters in a column to a dataframe in Python
                            
                                How to nest numba jitclass

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Big-O complexity of random.choice(list) in Python3

Tags:

python

complexity-theory

random

python-3.x

mil

People also ask

2 Answers

ShadowRanger

Muzhi

Recent Activity

Donate For Us