Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python list.clear() time and space complexity?

I am writing a blogpost on Python list.clear() method where I also want to mention about the time and space complexity of the underlying algorithm. I expected the time complexity to be O(N), iterate over the elements and free the memory? But, I found an article where it is mentioned that it is actually an O(1) operation. Then, I searched the source code of the method in CPython implementation and found a method which I believe is the actual internal implementation of list.clear(), however, I am not really sure it is. Here's the source code of the method:

static int
_list_clear(PyListObject *a)
{
    Py_ssize_t i;
    PyObject **item = a->ob_item;
    if (item != NULL) {
        /* Because XDECREF can recursively invoke operations on
          this list, we make it empty first. */
        i = Py_SIZE(a);
        Py_SIZE(a) = 0;
        a->ob_item = NULL;
        a->allocated = 0;
        while (--i >= 0) {
           Py_XDECREF(item[i]);
        }
        PyMem_FREE(item);
    }
    /* Never fails; the return value can be ignored.
       Note that there is no guarantee that the list is actually empty
       at this point, because XDECREF may have populated it again! */
    return 0;
}

I could be wrong but it does look like O(N) to me. Also, I found a similar question here, but there's no clear answer there. Just want to confirm the actual time and space complexity of list.clear(), and maybe a little explanation supporting the answer. Any help appreciated. Thanks.

like image 558
Pankaj Mishra Avatar asked Nov 30 '19 06:11

Pankaj Mishra


People also ask

What is clear () Python?

Python Dictionary clear() The clear() method removes all items from the dictionary. Syntax: dict.clear() Parameters: The clear() method doesn't take any parameters.

What does list Clear () function do?

The clear() method of List interface in Java is used to remove all of the elements from the List container. This method does not deleted the List container, instead it justs removes all of the elements from the List.

What is the time complexity of list in Python?

The average time complexity of the in operator for lists is O(n) . It becomes slower when there are many elements. The execution time varies greatly depending on the position of the value to look for. It takes the longest time when its value is at the end or does not exist.

What is the time complexity of remove?

remove() – runs in linear O(n) time. We have to iterate the entire array to find the element qualifying for removal. indexOf() – also runs in linear time. It iterates through the internal array and checks each element one by one, so the time complexity for this operation always requires O(n) time.


1 Answers

As you correctly noticed, the CPython implementation of list.clear is O(n). The code iterates over the elements in order to decrease the reference count of each one, without a way to avoid it. There is no doubt that it is an O(n) operation and, given a large enough list, you can measure the time spent in clear() as function of list size:

import time

for size in 1_000_000, 10_000_000, 100_000_000, 1_000_000_000:
    l = [None] * size
    t0 = time.time()
    l.clear()
    t1 = time.time()
    print(size, t1 - t0)

The output shows linear complexity; on my system with Python 3.7 it prints the following:

1000000 0.0023756027221679688
10000000 0.02452826499938965
100000000 0.23625731468200684
1000000000 2.31496524810791

The time per element is of course tiny because the loop is coded in C and each iteration does very little work. But, as the above measurement shows, even a miniscule per-element factor eventually adds up. Small per-element constant is not the reason to ignore the cost of an operation, or the same would apply to the loop that shifts the list elements in l.insert(0, ...), which is also very efficient - and yet few would claim insertion at the beginning to be O(1). (And clear potentially does more work because a decref will run an arbitrary chain of destructors for an object whose reference count actually reaches zero.)

On a philosophical level, one could argue that costs of memory management should be ignored when assessing complexity because otherwise it would be impossible to analyze anything with certainty, as any operation could trigger a GC. This argument has merit; GC does come occasionally and unpredictably, and its cost can be considered amortized across all allocations. In a similar vein complexity analysis tends to ignore the complexity of malloc because the parameters it depends on (like memory fragmentation) are typically not directly related to allocation size or even to the number of already allocated blocks. However, in case of list.clear there is only one allocated block, no GC is triggered, and the code is still visiting each and every list element. Even with the assumption of O(1) malloc and amortized O(1) GC, list.clear still takes the time proportional to the number of elements in the list.

The article linked from the question is about Python the language and doesn't mention a particular implementation. Python implementations that don't use reference counting, such as Jython or PyPy, are likely to have true O(1) list.clear, and for them the claim from the article would be entirely correct. So, when explaining the Python list on a conceptual level, it is not wrong to say that clearing the list is O(1) - after all, all the object references are in a contiguous array, and you free it only once. This is the point your blog post probably should make, and that is what the linked article is trying to say. Taking the cost of reference counting into account too early might confuse your readers and give them completely wrong ideas about Python's lists (e.g. they could imagine that they are implemented as linked lists).

Finally, at some point one must accept that memory management strategy does change complexity of some operations. For example, destroying a linked list in C++ is O(n) from the perspective of the caller; discarding it in Java or Go would be O(1). And not in the trivial sense of a garbage-collected language just postponing the same work for later - it is quite possible that a moving collector will only traverse reachable objects and will indeed never visit the elements of the discarded linked list. Reference counting makes discarding large containers algorithmically similar to manual collection, and GC can remove that. While CPython's list.clear has to touch every element to avoid a memory leak, it is quite possible that PyPy's garbage collector never needs to do anything of the sort, and thus has a true O(1) list.clear.

like image 167
user4815162342 Avatar answered Sep 23 '22 20:09

user4815162342