Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Associative cache simulation - Dealing with a Faulty Scheme

While working on simulating a fully associative cache (in MIPS assembly), a couple of questions came to mind based on some information read online;

According to some notes from the University of Maryland

Finding a slot: At most, one slot should match. If there is more than one slot that matches, then you have a faulty fully-associative cache scheme. You should never have more than one copy of the cache line in any slot of a fully-associative cache. It's hard to maintain multiple copies, and doesn't make sense. The slots could be used for other cache lines.

Does that mean that I should check all the time the whole tag list in order to check for a second match? After all if I don't, i will never "realize" about the fault with the cache, yet, checking every single time seems quite inefficient.

In the case I do check, and somehow I manage to find a second match, meaning faulty cache scheme, what shall I do then? Although the best answer would be to fix my implementation, yet Im interested on how to handle it during execution if this situation should arise.

like image 325
Carlos Avatar asked Oct 13 '22 20:10

Carlos


1 Answers

If more than one valid slot matches an address, then that means that when a previous search for the same address was executed, either a valid slot that should have matched the address was not used (perhaps because it was not checked in the first place) or more than one invalid slot was used to store the line that wasn't in the cache at all.

Without a doubt, this should be considered a bug.

But if we've just decided not to fix the bug (maybe we'd rather not commit that much hardware to a better implementation) the most obvious option is to pick one of the slots to invalidate. It will then be available for other cache lines.

As for how to pick which one to invalidate, if one of the duplicate lines is clean, invalidate that one in preference to a dirty cache line. If more than cache line is dirty and they disagree you have an even bigger bug to fix, but at any rate your cache is out of sync and it probably doesn't matter which you pick.

Edit: here's how I might implement hardware to do this:

First off, it doesn't make a whole lot of sense to start with the assumption of duplicates, rather we'll work around that at the appropriate time later. There are a few possibilities of what must happen when caching a new line.

  • The line is already in the cache, no action is needed
  • The line is not in the cache but there are invalid slots available: Place the new line into one of the available slots
  • The line is not in the cache but there are no invalid slots available. Another valid line must be evicted and the new line takes its place.
    • Picking an eviction candidate has performance consequences. Clean cache lines can be evicted for free, but if chosen poorly, it can cause another cache miss in the near future. Consider if all but one cache line is dirty. If only the clean cache line is evicted, then many sequential reads alternating between two addresses will cause a cache miss on every read. Cache invalidation is among the two hard problems in Comp Sci (the other being 'naming things') and out of the scope of this exact question.

I would probably implement a search that checks for the correct slot to act on for each of these. Then another block would pick the first from that list and act on it.

Now, getting back to the question. What are the conditions under which duplicates could possibly enter the cache. If memory accesses are strictly ordered, and the implementation (as above) is correct, I don't think duplicates are possible at all. And thus there's no need to check for them.

Now lets consider a more implausible case where A single cache is shared across two CPU cores. We're going to just do the simplest thing that could work and duplicate everything except the cache memory itself for each core. Thus the slot searching hardware is not shared. To support this, an extra bit per slot is used as a mutex. search hardware cannot use a slot that is locked by the other core. specifically,

  • If the address is in the cache, try to lock the slot and return that slot. If the slot is already locked, stall until it is free.
  • If the address is not in the cache, find an unlocked slot that is invalid or valid but evictable.

in this case we actually can end up in a position where two slots share the same address. If both cores try to write to an address that is not in the cache, they will end up getting different slots, and a duplicate line will occur. First lets think about what could happen:

  • Both lines were reads from main memory. They will be the same value and they will both be clean. It is correct to evict either.
  • Both lines were writes. Both will be dirty, but probably not be equal. This is a race condition that should have been resolved by the application by issuing memory fences or some other memory ordering instructions. We cannot guess which one should be used, if there was no cache the race condition would persist into RAM. It is correct to evict either.
  • One line was a read and one was a write. The write is dirty but the read is clean. Once again this race condition would have persisted into RAM if there was no intervening cache, but the reader could have seen a different value. evicting the clean line is right by RAM, and also has the side effect of always favoring read then write ordering.

So now we know what to do about it, but where does this logic belong. First lets think about what could happen if we don't do anything. A subsequent cache access for the same address on either core could return either line. Even if neither core is issuing writes, reads could keep coming up different, alternating between the two values. This breaks every conceivable idea about memory ordering.

one solution might be to just say that dirty lines belong to one core only, the line is not dirty, but dirty and owned by another core.

  • In the case of two concurrent reads, both lines are identical, unlocked and interchangeable. It doesn't matter which line a core gets for subsequent operations.
  • in the case of concurrent writes, both lines are out of sync, but mutually invisible. Although the race condition that this creates is unfortunate, it still leads to a reasonable memory ordering, as if all of the operations that happen on the discarded line happened before any of the operations on the cleaned line.
  • If a read and a write happen concurrently, the dirty line is invisible to the reading core. However, the clean line is visible to both cores, and would cause memory ordering to break down for the writer. future writes could even cause it to lock both (because both would be dirty).

That last case pretty much militates that dirty lines be preferred to clean ones. This forces at least some extra hardware to look for dirty lines first and clean lines only if no dirty lines were found. So now we have a new concurrent cache implementation:

  • If the address is in the cache and dirty and owned by the requesting core, use that slot
  • if the address is in the cache but clean
    • for reads, just use that slot
    • for writes, mark the slot as dirty and use that slot
  • if the address is not in the cache and there are invalid slots, use an invalid slot
  • if there are no invalid slots, evict a line and use that slot.

We're getting closer, there's still a hole in the implementation. What if both cores access the same address but not concurrently. The simplest thing is probably to just say that dirty lines are really invisible to other cores. In cache but dirty is the same as not being in the cache at all.

Now all we have to think about is actually providing the tool for applications to synchronize. I'd probably do a tool that just explicitly flushes a line if it is dirty. This would just invoke the same hardware that is used during eviction, but marks the line as clean instead of invalid.

To make a long post short, the idea is to deal with the duplicates not by removing them, but by making sure they cannot lead to further memory ordering issues, and leaving the deduplication work to the application or eventual eviction.

like image 91
SingleNegationElimination Avatar answered Oct 18 '22 03:10

SingleNegationElimination