Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Internals of Python list, access and resizing runtimes

Is Python's [] a list or an array?
Is the access time of an index O(1) like an array or O(n) like a list?
Is appending/resizing O(1) like a list or O(n) like an array, or is it a hybrid that can manage O(1) for accessing and resizing?

I read here that array access is really slow in Python. However, when I wrote a memoized version of a recursive fibonacci procedure using both a dictionary (Python's dictionary is suppose to be really fast) and a list, they had equal times. Why is this?

Does a Python tuple have faster access times than a python list?

like image 984
daveeloo Avatar asked May 09 '11 03:05

daveeloo


People also ask

How list works internally in Python?

Python lists are internally represented as arrays. The idea used is similar to implementation of vectors in C++ or ArrayList in Java. The costly operations are inserting and deleting items near the beginning (as everything has to be moved). Insert at the end also becomes costly if preallocated space becomes full.

How does Python resize list?

To avoid the cost of resizing, Python does not resize a list every time you need to add or remove an item. Instead, every list has a number of empty slots which are hidden from a user but can be used for new items. If the slots are completely consumed Python over-allocates additional space for them.

How are Python lists stored in memory?

Python has a built-in module named 'array' which is similar to arrays in C or C++. In this container, the data is stored in a contiguous block of memory. Just like arrays in C or C++, these arrays only support one data type at a time, therefore it's not heterogenous like Python lists. The indexing is similar to lists.


1 Answers

Python's [] is implemented as an array, not a linked list. Although resizing is O(n), appending to it is amortized O(1), because resizes happen very rarely. If you're not familiar with how this works, read this Wikipedia entry on dynamic arrays. Python's list doesn't expand by a factor of 2 each time, it's a bit more complicated than that, but the expansions are still designed to make appending amortized O(1).

Inserting in the middle, however, is always an inefficient O(n), because n items may have to be moved.

Tuples aren't faster than lists - they're just immutable lists under the hood (*).

Regarding your dictionary test: depending on your exact implementation, caching in a list will be faster than with a dict. However, Python's dicts are highly optimized, and especially for small amounts of keys will perform great.


(*) Here's a list's "get item" C function in Python 2.6:

PyObject * PyList_GetItem(PyObject *op, Py_ssize_t i) {     if (!PyList_Check(op)) {         PyErr_BadInternalCall();         return NULL;     }     if (i < 0 || i >= Py_SIZE(op)) {         if (indexerr == NULL)             indexerr = PyString_FromString(                 "list index out of range");         PyErr_SetObject(PyExc_IndexError, indexerr);         return NULL;     }     return ((PyListObject *)op) -> ob_item[i]; } 

And this is a tuple's:

PyObject * PyTuple_GetItem(register PyObject *op, register Py_ssize_t i) {     if (!PyTuple_Check(op)) {         PyErr_BadInternalCall();         return NULL;     }     if (i < 0 || i >= Py_SIZE(op)) {         PyErr_SetString(PyExc_IndexError, "tuple index out of range");         return NULL;     }     return ((PyTupleObject *)op) -> ob_item[i]; } 

As you can see, they're almost exactly the same. In the end, after some type and bound checking, it's a simple pointer access with an index.

[Reference: Python documentation on Time Complexity for data type operations]

like image 148
Eli Bendersky Avatar answered Sep 25 '22 01:09

Eli Bendersky