Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why Use A Doubly Linked List and HashMap for a LRU Cache Instead of a Deque?

I have implemented the design a LRU Cache Problem on LeetCode using the conventional method (Doubly Linked List+Hash Map). For those unfamiliar with the problem, this implementation looks something like this: enter image description here

I understand why this method is used (quick removal/insertion at both ends, fast access in the middle). What I am failing to understand is why someone would use both a HashMap and a LinkedList when one could simply use a array-based deque (in Java ArrayDeque, C++ simply deque). This deque allows for ease of insertion/deletion at both ends, and quick access in the middle which is exactly what you need for an LRU cache. You also would use less space because you wouldn't need to store a pointer to each node.

Is there a reason why the LRU cache is almost universally designed (on most tutorials at least) using the latter method as opposed to the Deque/ArrayDeque method? Would the HashMap/LinkedList method have any benefits?

like image 915
user3586940 Avatar asked Dec 18 '22 18:12

user3586940


2 Answers

When an LRU cache is full, we discard the Least Recently Used item.

If we're discarding items from the front of the queue, then, we have to make sure the item at the front is the one that hasn't been used for the longest time.

We ensure this by making sure that an item goes to the back of the queue whenever it is used. The item at the front is then the one that hasn't been moved to the back for the longest time.

To do this, we need to maintain the queue on every put OR get operation:

  • When we put a new item in the cache, it becomes the most recently used item, so we put it at the back of the queue.

  • When we get an item that is already in the cache, it becomes the most recently used item, so we move it from its current position to the back of the queue.

Moving items from the middle to the end is not a deque operation and is not supported by the ArrayDeque interface. It's also not supported efficiently by the underlying data structure that ArrayDeque uses. Doubly-linked lists are used because they do support this operation efficiently.

like image 66
Matt Timmermans Avatar answered Apr 23 '23 01:04

Matt Timmermans


The purpose of an LRU cache is to support two operations in O(1) time: get(key) and put(key, value), with the additional constraint that least recently used keys are discarded first. Normally the keys are the parameters of a function call and the value is the cached output of that call.

Regardless of how you approach this problem we can agree that you MUST use a hashmap. You need a hashmap to map a key already present in the cache to the value in O(1).

In order to deal with the additional constraint of least recently used keys being discarded first you can use a LinkedList or ArrayDeque. However since we don't actually need to access the middle, a LinkedList is better since you don't need to resize.

Edit:

Mr. Timmermans discussed in his answer why ArrayDeques cannot be used in an LRU cache due to the necessity of moving elements from the middle to the end. With that being said here is an implementation of an LRU cache that successfully submits on leetcode using only appends and poplefts in the deque. Note that python's collections.deque is implemented as a doubly linked list, however we are only using operations in collections.deque that are also O(1) in a circular array, so the algorithm stays the same regardless.

from collections import deque

class LRUCache:

    def __init__(self, capacity: 'int'):
        self.capacity = capacity
        self.hashmap = {}
        self.deque = deque()

    def get(self, key: 'int') -> 'int':
        res = self.hashmap.get(key, [-1, 0])[0]
        if res != -1:
            self.put(key, res)
        return res

    def put(self, key: 'int', value: 'int') -> 'None':

        self.add(key, value)
        while len(self.hashmap) > self.capacity:
            self.remove()

    def add(self, key, value):
        if key in self.hashmap:
            self.hashmap[key][1] += 1
            self.hashmap[key][0] = value
        else:
            self.hashmap[key] = [value, 1]
        self.deque.append(key)

    def remove(self):
        k = self.deque.popleft()
        self.hashmap[k][1] -=1
        if self.hashmap[k][1] == 0:
            del self.hashmap[k]

I do agree with Mr. Timmermans that using the LinkedList approach is preferable - but I want to highlight that using an ArrayDeque to build an LRU cache is possible.

The main mixup between myself and Mr. Timmermans is how we interpreted capacity. I took capacity to mean caching the last N get / put requests, while Mr. Timmermans took it to mean caching the last N unique items.

The above code does have a loop in put which slows the code down - but this is just to get the code to conform to caching the last N unique items. If we had the code cache the last N requests instead, we could replace the loop with:

if len(self.deque) > self.capacity: self.remove()

This will make it as fast if not faster than the linked-list variant.

Regardless of what maxsize is interpreted as, the above method still works as an LRU cache - least recently used elements get discarded first.

I just want to highlight that the designing an LRU cache in this manner is possible. The source is right there - try to submit it on Leetcode!

like image 25
Primusa Avatar answered Apr 22 '23 23:04

Primusa