I have tried some quick experiment comparing the performance of native Python lists with linked lists implementations such as this.
The native python lists are always faster than the non native linked list in the cases where they should not be (according the theory).
from linkedlist import *
import time
a = LinkedList()
b = []
for i in range(1000000):
b.append(i)
a.add_first(i)
t0 = time.clock()
a.remove(10)
t1 = time.clock()
b.remove(10)
t2 = time.clock()
print t1-t0
print t2-t1
The results I have on the test above are:
native linked list = 2.00000000001e-05
python list = 0.005576
non native linked list = 3.90000000001e-05
So, I was wondering why Python don't have a native Linked List data structure. In the case of Python, it looks to me that it could be useful algorithmically speaking to have Linked List instead of the standard Lists to speed up some aspects of the standard library.
My understanding is that the List data structure is a key building block of the language and that it makes the code more maintainable and easily optimizable to focus on that very data structure.
Is there any other reason?
To start with Python, it does not have a linked list library built into it like the classical programming languages. Python does have an inbuilt type list that works as a dynamic array but its operation shouldn't be confused with a typical function of a linked list.
Linked List Utility. Lists are one of the most popular and efficient data structures, with implementation in every programming language like C, C++, Python, Java, and C#.
They are too low level. You already have the array type, which is mostly implemented in native code and useful in the general case. One benefit of linked lists is quick removal of elements.
Using linked list binary search will take O(n) time. So binary search is inefficient with linked list.
Indeed, Python doesn't have a native linked list implementation, and I would also love to know why.
One alternative you may want to consider is collections.deque
(= "double-ended queue") which provides very fast constant-time O(1) insertion at both ends. In fact, for your specific example, a deque is a better choice than a linked list, since you don't need insertion in the middle.
However, in general, it is important to keep in mind that a deque is a different data structure than a linked list. A linked list also provides constant-time O(1) insertion in the middle, while a deque only provides linear-time O(n) insertion in the middle. In other words, the time it takes to insert an element in the middle of a deque is proportional to the current length n of the deque, and for a sufficiently large n, it will be slower than with a linked list.
Internally, collections.deque is in fact implemented using a linked list. However, this is an implementation detail, and more importantly, it is implemented as a linked list of fixed-size blocks of elements, not as a linked list of individual elements.
This is why insertion in the middle of a collections.deque is O(n) rather than O(1): you still need to modify around half the memory of the whole deque to accommodate a new element N:
before inserting N: [ABCDE]⇄[FGHIJ]⇄[KLM__]
after inserting N: [ABCDE]⇄[FGNHI]⇄[JKLM_]
changed memory: ^^^ ^^^^
Instead, with a true linked list (= of individual elements), inserting a new element N in the middle simply consists of allocating a new node and updating the value of four pointers, which is an operation whose performance is independent of the current size of the linked list:
before inserting N: [A]⇄[B]⇄[C]⇄[D]⇄[E]⇄[F]⇄[G]⇄ [H]⇄[I]⇄[J]⇄[K]⇄[L]⇄[M]
after inserting N: [A]⇄[B]⇄[C]⇄[D]⇄[E]⇄[F]⇄[G]⇄[N]⇄[H]⇄[I]⇄[J]⇄[K]⇄[L]⇄[M]
changed memory: ^ ^ ^
The trade-off is that the deque has better memory locality and requires less independent memory allocations. For example, inserting the new element N in the deque above didn't require any new memory allocation at all. This is why in practice, and especially if you're often inserting at the ends rather than in the middle, a deque is in fact a better choice than a linked list.
Note that while inserting elements in the middle of a deque is O(n), inserting new elements at the beginning or the end is O(1):
before: [ABCDE]⇄[FGNHI]⇄[JKLM_]
prepending P: [____P]⇄[ABCDE]⇄[FGNHI]⇄[JKLM_]
^ ^
prepending Q: [___QP]⇄[ABCDE]⇄[FGNHI]⇄[JKLM_]
^
appending R: [___QP]⇄[ABCDE]⇄[FGNHI]⇄[JKLMR]
^
appending S: [___QP]⇄[ABCDE]⇄[FGNHI]⇄[JKLMR]⇄[S____]
^ ^
Of course, for the linked list insertion to actually be O(1), this assumes that you already have a handle h
to the node before or after which you want to insert your new node n
. On a hypothetical implementation of LinkedList, this could look like:
n = linkedlist.insertbefore(h, "some value")
where:
type(h) # => <class 'Node'>
type(n) # => <class 'Node'>
n.value # => "some value"
n.next == h # => True
If you do not have such handle, then a function like insert(i, x)
will still be O(n), because finding the i-th element is O(n), even though the insert operation itself is O(1). Here is some hypothetical implementation of insert(i, x)
on our hypothetical LinkedList:
def insert(self, i, x):
node = self.node_from_index(i) # Find the i-th node: O(n)
return self.insertbefore(node, x) # Insert and return new node: O(1)
This means that linked lists are only worth it for certain problems when you do keep those node handles around. It also makes the API a bit less convenient, and despite each operation being O(1) if you're careful, the constant is generally much bigger. So in practice, they are not that often useful, which may be why they're aren't built-in linked lists in Python.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With