Is there any Python complexity reference? In cppreference, for example, for many functions (such as std::array::size or std::array::fill) there's a complexity section which describes their running complexity, in terms of linear in the size of the container or constant.
I would expect the same information to appear in the python website, perhaps, at least for the CPython implementation. For example, in the list reference, in list.insert
I would expect to see complexity: linear; I know this case (and many other container-related operations) is covered here, but many other cases are not. Here are a few examples:
tuple.__le__
? It seems like when comparing two tuples of size n
, k
, the complexity is about O(min(n,k))
(however, for small n
's it looks different).random.shuffle
? It appears to be O(n)
. It also appears that the complexity of random.randint
is O(1)
.__format__
method of strings? It appears to be linear in the size of the input string; however, it also grows when the number of relevant arguments grow (compare ("{0}"*100000).format(*(("abc",)*100000))
with ("{}"*100000).format(*(("abc",)*100000))
).I'm aware that (a) each of these questions may be answered by itself, (b) one may look at the code of these modules (even though some are written in C), and (c) StackExchange is not a python mailing list for user requests. So: this is not a doc-feature request, just a question of two parts:
Computational complexity is a field from computer science which analyzes algorithms based on the amount resources required for running it. The amount of required resources varies based on the input size, so the complexity is generally expressed as a function of n, where n is the size of the input.
To understand Python code complexity we can take a look at Cyclomatic Complexity (proposed by Tomas McCabe in 1976), a metric used to calculate it. This is a measure of the linearly independent paths computed using the control-flow graph of your code.
Either method is O(n). In the first case each element of the list is visited once hence O(n). In the second case the print() function itself iterates over the list visiting each element once, also O(n). If you really want to you can read the code in the Python source code repository.
CPython is pretty good about its algorithms, and the time complexity of an operation is usually just the best you would expect of a good standard library.
For example:
Tuple ordering has to be O(min(n,m)), because it works by comparing element-wise.
random.shuffle
is O(n), because that's the complexity of the modern Fisher–Yates shuffle.
.format
I imagine is linear, since it only requires one scan through the template string. As for the difference you see, CPython might just be clever enough to cache the same format code used twice.
The docs do mention time complexity, but generally only when it's not what you would expect — for example, because a deque
is implemented with a doubly-linked list, it's explicitly mentioned as having O(n)
for indexing in the middle.
Would the docs benefit from having time complexity called out everywhere it's appropriate? I'm not sure. The docs generally present builtins by what they should be used for and have implementations optimized for those use cases. Emphasizing time complexity seems like it would either be useless noise or encourage developers to second-guess the Python implementation itself.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With