Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is deque implemented as a linked list instead of a circular array?

CPython deque is implemented as a doubly-linked list of 64-item sized "blocks" (arrays). The blocks are all full, except for the ones at either end of the linked list. IIUC, the blocks are freed when a pop / popleft removes the last item in the block; they are allocated when append/appendleft attempts to add a new item and the relevant block is full.

I understand the listed advantages of using a linked list of blocks rather than a linked list of items:

  • reduce memory cost of pointers to prev and next in every item
  • reduce runtime cost of doing malloc/free for every item added/removed
  • improve cache locality by placing consecutive pointers next to each other

But why wasn't a single dynamically-sized circular array used instead of the doubly-linked list in the first place?

AFAICT, the circular array would preserve all the above advantages, and maintain the (amortized) cost of pop*/append* at O(1) (by overallocating, just like in list). In addition, it would improve the cost of lookup by index from the current O(n) to O(1). A circular array would also be simpler to implement, since it can reuse much of the list implementation.

I can see an argument in favor of a linked list in languages like C++, where removal of an item from the middle can be done in O(1) using a pointer or iterator; however, python deque has no API to do this.

like image 738
max Avatar asked Jul 16 '17 23:07

max


People also ask

Why would we use a linked list instead of an array to implement a stack or a queue?

For the queue, a linked list would provide faster results when manipulating data in the middle of the queue (add/delete): O(1). If implemented with an array or vector, it would be O(n) because you have to move other elements to create the space for the new element, or fill the space of the deleted element.

Is a deque a linked list?

deque uses a linked list as part of its data structure. This is the kind of linked list it uses. With doubly linked lists, deque is capable of inserting or deleting elements from both ends of a queue with constant O(1) performance.

Is it better to implement a queue as an array or a linked list?

The linked list versions have better worst-case behavior, but may have a worse overall runtime because of the number of allocations performed. The array versions are slower in the worst-case, but have better overall performance if the time per operation isn't too important.

What is the difference between deque and linked list?

The general-purpose implementations include LinkedList and ArrayDeque classes. The Deque interface supports insertion, removal and retrieval of elements at both ends. The ArrayDeque class is the resizeable array implementation of the Deque interface, whereas the LinkedList class is the list implementation.


2 Answers

Adapted from my reply on the python-dev mailing list:

The primary point of a deque is to make popping and pushing at both ends efficient. That's what the current implementation does: worst-case constant time per push or pop regardless of how many items are in the deque. That beats "amortized O(1)" in the small and in the large. That's why it was done this way.

Some other deque methods are consequently slower than they are for lists, but who cares? For example, the only indices I've ever used with a deque are 0 and -1 (to peek at one end or the other of a deque), and the implementation makes accessing those specific indices constant-time too.

Indeed, the message from Raymond Hettinger referenced by Jim Fasarakis Hilliard in his comment:

https://www.mail-archive.com/[email protected]/msg25024.html

confirms that

The only reason that __getitem__ was put in was to support fast access to the head and tail without actually popping the value

like image 159
Tim Peters Avatar answered Oct 14 '22 07:10

Tim Peters


In addition to accepting @TimPeters answer, I wanted to add a couple additional observations that don't fit into a comment format.

Let's just focus on a common use case where deque is used as a simple a FIFO queue.

Once the queue reaches its peak size, the circular array need no more allocations of memory from the heap. I thought of it as an advantage, but it turns out the CPython implementation achieved the same by keeping a list of reusable memory blocks. A tie.

While the queue size is growing, both the circular array and the CPython need memory from the heap. CPython needs a simple malloc, while the array needs the (potentially much more expensive) realloc (unless extra space happens to be available on the right edge of the original memory block, it needs to free the old memory and copy the data over). Advantage to CPython.

If the queue peaked out at a much larger size than its stable size, both CPython and the array implementation would waste the unused memory (the former by saving it in a reusable block list, the latter by leaving the unused empty space in the array). A tie.

As @TimPeters pointed out, the cost of each FIFO queue put / get is always O(1) for CPython, but only amortized O(1) for the array. Advantage to CPython.

like image 31
max Avatar answered Oct 14 '22 06:10

max