While working on simulating a fully associative cache (in MIPS assembly), a couple of questions came to mind based on some information read online;
According to some notes from the University of Maryland
Finding a slot: At most, one slot should match. If there is more than one slot that matches, then you have a faulty fully-associative cache scheme. You should never have more than one copy of the cache line in any slot of a fully-associative cache. It's hard to maintain multiple copies, and doesn't make sense. The slots could be used for other cache lines.
Does that mean that I should check all the time the whole tag list in order to check for a second match? After all if I don't, i will never "realize" about the fault with the cache, yet, checking every single time seems quite inefficient.
In the case I do check, and somehow I manage to find a second match, meaning faulty cache scheme, what shall I do then? Although the best answer would be to fix my implementation, yet Im interested on how to handle it during execution if this situation should arise.
If more than one valid slot matches an address, then that means that when a previous search for the same address was executed, either a valid slot that should have matched the address was not used (perhaps because it was not checked in the first place) or more than one invalid slot was used to store the line that wasn't in the cache at all.
Without a doubt, this should be considered a bug.
But if we've just decided not to fix the bug (maybe we'd rather not commit that much hardware to a better implementation) the most obvious option is to pick one of the slots to invalidate. It will then be available for other cache lines.
As for how to pick which one to invalidate, if one of the duplicate lines is clean, invalidate that one in preference to a dirty cache line. If more than cache line is dirty and they disagree you have an even bigger bug to fix, but at any rate your cache is out of sync and it probably doesn't matter which you pick.
Edit: here's how I might implement hardware to do this:
First off, it doesn't make a whole lot of sense to start with the assumption of duplicates, rather we'll work around that at the appropriate time later. There are a few possibilities of what must happen when caching a new line.
I would probably implement a search that checks for the correct slot to act on for each of these. Then another block would pick the first from that list and act on it.
Now, getting back to the question. What are the conditions under which duplicates could possibly enter the cache. If memory accesses are strictly ordered, and the implementation (as above) is correct, I don't think duplicates are possible at all. And thus there's no need to check for them.
Now lets consider a more implausible case where A single cache is shared across two CPU cores. We're going to just do the simplest thing that could work and duplicate everything except the cache memory itself for each core. Thus the slot searching hardware is not shared. To support this, an extra bit per slot is used as a mutex. search hardware cannot use a slot that is locked by the other core. specifically,
in this case we actually can end up in a position where two slots share the same address. If both cores try to write to an address that is not in the cache, they will end up getting different slots, and a duplicate line will occur. First lets think about what could happen:
So now we know what to do about it, but where does this logic belong. First lets think about what could happen if we don't do anything. A subsequent cache access for the same address on either core could return either line. Even if neither core is issuing writes, reads could keep coming up different, alternating between the two values. This breaks every conceivable idea about memory ordering.
one solution might be to just say that dirty lines belong to one core only, the line is not dirty, but dirty and owned by another core.
That last case pretty much militates that dirty lines be preferred to clean ones. This forces at least some extra hardware to look for dirty lines first and clean lines only if no dirty lines were found. So now we have a new concurrent cache implementation:
We're getting closer, there's still a hole in the implementation. What if both cores access the same address but not concurrently. The simplest thing is probably to just say that dirty lines are really invisible to other cores. In cache but dirty is the same as not being in the cache at all.
Now all we have to think about is actually providing the tool for applications to synchronize. I'd probably do a tool that just explicitly flushes a line if it is dirty. This would just invoke the same hardware that is used during eviction, but marks the line as clean instead of invalid.
To make a long post short, the idea is to deal with the duplicates not by removing them, but by making sure they cannot lead to further memory ordering issues, and leaving the deduplication work to the application or eventual eviction.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With