Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to implement LFU cache using STL?

I'm trying to implement LFU (Least Frequently Used) cache using pure STL (I don't want to use Boost!).

Requirements are:

  • Associative access to any element using a Key like with std::map.
  • Ability to release the lowest priority item (using its UsesCount attribute).
  • Ability to update priority (UsesCount) of any item.

The problems are:

  • If I use std::vector as container of items (Key, Value, UsesCount), std::map as a container of iterators to the vector for associative access and std::make_heap, std::push_heap and std::pop_heap as priority queue implementation within the vector, the itertors in the map are not valid after heap operations.
  • If I use std::list (or std::map) instead of std::vector in the previous configuration, std::make_heap etc. can't be compiled becasue their iterators does not support aritmetic.
  • If I'd like to use std::priority_queue, I don't have ability to update item priority.

The questions are:

  • Am I missing something obvious how this problem could be solved?
  • Can you point me to some pure C++/STL implementation of LFU cache meeting previous requirements as an example?

Thank you for your insights.

like image 334
Blackhex Avatar asked Jul 10 '12 08:07

Blackhex


2 Answers

Your make implementation using the *_heap functions and a vector seems to be a good fit. although it will result in slow updates. The problem about iterator invalidation you encounter is normal for every container using a vector as an underlying data structure. This is the approach also taken by boost::heap::priority_queue, but it does not provide a mutable interface for the reason mentioned above. Other boost::heap data-structures offer the ability to update the heap.

Something that seems a little odd: Even if you would be able to use std::priority_queue you will still face the iterator invalidation problem.

To answer your questions directly: You are not missing something obvious. std::priority_queue is not as useful as it should be. The best approach is to write your own heap implementation that supports updates . To make it fully STL compatible (especially allocator aware) is rather tricky and not a simple task. On top of that, implement the LFU cache.

For the first step, look at the Boost implementations to get an idea of the effort. I'm not aware of any reference implementation for the second.

To work around iterator invalidation you can always, choose indirection into another container, although you should try to avoid it as it creates an additional cost and can get quite messy.

like image 55
pmr Avatar answered Oct 05 '22 01:10

pmr


A somewhat simpler approach than keeping two data structures:

  • just keep a map, which maps your keys to their value/use-count pair.
  • when the cache is full:
    • create a vector of iterators to the map elements (O(n))
    • use std::nth_element to find the worst 10% (O(n))
    • remove them all from the map (O(n log n))

So, adding a new element to the cache is common case O(log n), worst case O(n log n), and amortized O(log n).

Removing the worst 10% might be a bit drastic in a LFU cache, because new entries have to make the top 90% or they're cut. Then again, if you only remove one element then new entries still need to get off the bottom before the next new entry, or they're cut, and they have less time to do so. So depending why LFU is the right caching strategy for you, my change to it might be the wrong strategy, or it might still be fine.

like image 33
Steve Jessop Avatar answered Oct 05 '22 03:10

Steve Jessop