"data bit" capacity vs "overhead bit" size?

Tags:

I am a little stuck because I cannot find anything which covers the "data" part of the cache, everything that I have googled deals 99.9% with the addressing of cache. The question I was asked is worded as such

Contrast the difference between "data bit" capacity and "overhead bit" size
for the two caches.

I don't want the answer so I am not going to post the actual set sizes and what not, I am just looking for a direction to maybe a website or an explanation of how to "contrast" the two. Any possible help is well appreciated!

770

asked Jul 05 '12 19:07

Serguei Fedorov

2 Answers

I'm not sure you've given us enough context for this question, but here goes.

Caches have to store not only the actual cached data, but also - for every piece of data - the "index" that it refers to. So when you lookup record N, the cache has to hold not only the value of record N, but also N - so that you can actually look up the data. And that's a pretty simplistic way of looking at it. Caches may have other metadata to indicate validity and last access time, etc.

Example #1: a cache of bytes in a 32-bit address space

Each cache entry has to store the data value (8 bits) plus the address (32bits) = 40 bits,

Example #2: a cache of 32-bit words in a 32-bit address space

Each cache entry has to store the data value (32 bits) plus the address (32bits) = 64 bits,

You can see that example #1 has a significantly higher overhead.

As always, Wikipedia may help. http://en.wikipedia.org/wiki/Cache_(computing)

136

answered Sep 17 '22 21:09

Roddy

Caches store data, usually in SRAM like data arrays, but also have overhead. I do not particularly like the terms "data bit size" and "overhead bit size", because (a) there is overhead that is not storage bit cells, and (b) not all bit cells are equally costly. But let's go along with those terms for now.

My take is that "overhead bit size" probably refers to the number of tag bits that need to be stored in order to access the cache. Often these are stored in a different array, a tag array separate from the data array. Compare to the number of data bits.

Here are three simple examples:

Consider a 32 KiB (kilobyte) cache, with 64 B (byte) cache lines. Typically, we would let bits 0-5 of the address be the cache line offset.

32 KiB / (64 B/line) => 2^(5+10) / 2^6 => 2^9 => 512 cache lines.

---++ Example 1: Direct Mapped

Let us imagine that it is a direct mapped cache. Then we might take the next 9 bits, bits 6-14 of the address, as an "index" into the array of cache lines.

So far so good. Now, to figure out the tag, we need to know the full address width. Let us say that it is 64 bits (although most "64-bit" machines only implement 40 or 48 bits as of 2012). In order to distinguish a cache line from any other cache line that maps to the same entry in the cache, we need to store the remaining bits of the address, bits 15-63, 49 bits, as the tag.

An access to such a direct mapped cache then proceeds by extracting the index, reading out the tag and data with that index, comparing the tag read out to the tag of the address we are looking up, declaring a hit if they match and a miss if not, and so on.

Overhead: 49 bits of tag for every 64B (512 bits) of data.

Total: * tag or "overhead": 512 * 49 bits * data bits: 512*512 = 32KiB = 256 Kib (kibi-bits).

---++ Example 2: 8-way Set Associative

Now let us imagine that the cache is 8 way associative. This means that the 512 lines will be divided into 512/8 = 64 sets, each containing 8 lines.

The offset inside a cache line is still bits 0-5.

However, we now need only 6 bits as an index, to determine the set number. Bits 6-11.

The tag needs to be all the remaining bits, bits 12-63, 52 bits total.

So, the tag overhead for an 8 way associative cache is 52 bits of tag for 512 bits of data.

Total: * tag: 512 * 52 bits * data: 512 Kib

Compare to the 49 bits of tag for direct mapped. 8-way set associative basically moves log2(8) more bits into the tag; in general, N-way set associative moves ceil(log2(N)) bits into the tag.

---++ Example 3: Fully Associative

This is the far end of the spectrum from direct mapped. We still have 512 bits of data per cache line, but now the entire 64 bit address, except for the 6 bit offset, is tag. 58 bits of tag for fully associative, versus 52 bits for 8-way, versus 49 bits for direct mapped.

But remember I said that I dislike the term "overhead bits"? The tag bits in a fully associative cache typically have to be not just ordinary storage bits, but must also have comparators - basically XOR gates. Such "CAM (Content Addressable Memory)" bits are usually more expensive than ordinary bits.

---+ Conclusion

So, I think this is what your teacher wants: a straightforward comparison of data bits versus tag bits. This is a lower bound on overhead.

The spectrum from direct mapped through N-way set associative to fully associative provides an example. But there are other aspects of cache design that affect overhead. For example:

if you use different address sizes, the percentage overhead changes. E.g. 32 bit addresses would only have 17 bits of tag in the diredt mapped example, versus 49 bits for a 64 bit address.
if you change the cache indexing function, you may have to change the tag size. For example, there is some benefit to having a prime number of cache lines or sets in a cache, e.g. 511 lines rather than 512 for a sdirect mapped cache. Prime numbers like this have reduced resonance problems. But done simply, they require increasing the tag width to full width 58 bits.
schemes like sectored caches allow some parts of the tag bits to be shared.

And so on.

As for a tutorial website:

sorry, I do not know of one for such beginner stuff. But I would google for class notes from many universities.
my website, http://comp-arch.net, covers advanced topics in computer architecture. But this sort of thing is too basic, too elementary, for me to put on comp.arch. Although I suppose I should probably write up some simple explanations of the basics before moving on to the advanced topics. Occasionally I write such tutorials, as here, but I have not collected them.
the USEnet newsgroup comp.arch may be useful.

---+ Why does this matter to programmers on stackoverflow?

This is mainly a hardware topic.

But programmers tuning code need to understand stuff like this, to get the best performance.

answered Sep 18 '22 21:09

Krazy Glew

Related questions
                            
                                How can I convert a Unicode path to a c string?
                            
                                Detect desktop environment in Linux programmatically in C
                            
                                How to add a C++ compiler flag to extconf.rb
                            
                                Passing char * vs char ** as parameters to a function in C
                            
                                Free cross-platform library to convert numbers (money amounts) to words? [closed]
                            
                                Matrix Multiplication using CUDA
                            
                                Swig, returning an array of doubles
                            
                                Why is non-blocking socket connect so slow?
                            
                                Derive minimal regular expression from input
                            
                                Are there any performance test results for usage of likely/unlikely hints?
                            
                                How to synchronize access to a global variable with very frequent reads / very rare writes?
                            
                                Is there a way to find out the number of references to a dynamic library in a process?
                            
                                Fast resize of a mmap file
                            
                                What strategies exist for ensuring all locale-aware operations are handled correctly in all locales?
                            
                                efficient methods to do summation
                            
                                Execute a process from memory within another process?
                            
                                Bind a python library TO C
                            
                                Screenshot colour averaging of rectangles
                            
                                Implementing different yet similar structure/function sets without copy-paste
                            
                                Is kernel/sched.c/context_switch() guaranteed to be invoked every time a process is switched in?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

"data bit" capacity vs "overhead bit" size?

Tags:

c

computer-science

computer-architecture

Serguei Fedorov

People also ask

2 Answers

Roddy

Krazy Glew

Recent Activity

Donate For Us