Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

If registers are so blazingly fast, why don't we have more of them?

In 32bit, we had 8 "general purpose" registers. With 64bit, the amount doubles, but it seems independent of the 64bit change itself.
Now, if registers are so fast (no memory access), why aren't there more of them naturally? Shouldn't CPU builders work as many registers as possible into the CPU? What is the logical restriction to why we only have the amount we have?

like image 891
Xeo Avatar asked May 21 '11 02:05

Xeo


People also ask

Why are there only 32 registers?

The MIPS architecture allows 5 bits to specify each of those registers, and 32 is the maximum number you can represent with five bits, so there is no point giving you more registers that you can't access.

Why do cpus have so few registers?

The memory that registers use is really expensive to engineer in the CPU. Aside from the design difficulties in doing so, increasing the number of available registers make CPU chips more expensive.

Why can't we add lots of registers to the CPU and just use those instead of primary memory?

Increasing the number of the registers would require increasing the instruction length in order to include sufficient bits that could access all the registers.

Why are registers the fastest?

Registers are essentially internal CPU memory. So accesses to registers are easier and quicker than any other kind of memory accesses. Show activity on this post. Smaller memories are generally faster than larger ones; they can also require fewer bits to address.


1 Answers

There's many reasons you don't just have a huge number of registers:

  • They're highly linked to most pipeline stages. For starters, you need to track their lifetime, and forward results back to previous stages. The complexity gets intractable very quickly, and the number of wires (literally) involved grows at the same rate. It's expensive on area, which ultimately means it's expensive on power, price and performance after a certain point.
  • It takes up instruction encoding space. 16 registers takes up 4 bits for source and destination, and another 4 if you have 3-operand instructions (e.g ARM). That's an awful lot of instruction set encoding space taken up just to specify the register. This eventually impacts decoding, code size and again complexity.
  • There's better ways to achieve the same result...

These days we really do have lots of registers - they're just not explicitly programmed. We have "register renaming". While you only access a small set (8-32 registers), they're actually backed by a much larger set (e.g 64-256). The CPU then tracks the visibility of each register, and allocates them to the renamed set. For example, you can load, modify, then store to a register many times in a row, and have each of these operations actually performed independently depending on cache misses etc. In ARM:

ldr r0, [r4] add r0, r0, #1 str r0, [r4] ldr r0, [r5] add r0, r0, #1 str r0, [r5] 

Cortex A9 cores do register renaming, so the first load to "r0" actually goes to a renamed virtual register - let's call it "v0". The load, increment and store happen on "v0". Meanwhile, we also perform a load/modify/store to r0 again, but that'll get renamed to "v1" because this is an entirely independent sequence using r0. Let's say the load from the pointer in "r4" stalled due to a cache miss. That's ok - we don't need to wait for "r0" to be ready. Because it's renamed, we can run the next sequence with "v1" (also mapped to r0) - and perhaps that's a cache hit and we just had a huge performance win.

ldr v0, [v2] add v0, v0, #1 str v0, [v2] ldr v1, [v3] add v1, v1, #1 str v1, [v3] 

I think x86 is up to a gigantic number of renamed registers these days (ballpark 256). That would mean having 8 bits times 2 for every instruction just to say what the source and destination is. It would massively increase the number of wires needed across the core, and its size. So there's a sweet spot around 16-32 registers which most designers have settled for, and for out-of-order CPU designs, register renaming is the way to mitigate it.

Edit: The importance of out-of-order execution and register renaming on this. Once you have OOO, the number of registers doesn't matter so much, because they're just "temporary tags" and get renamed to the much larger virtual register set. You don't want the number to be too small, because it gets difficult to write small code sequences. This is a problem for x86-32, because the limited 8 registers means a lot of temporaries end up going through the stack, and the core needs extra logic to forward reads/writes to memory. If you don't have OOO, you're usually talking about a small core, in which case a large register set is a poor cost/performance benefit.

So there's a natural sweet spot for register bank size which maxes out at about 32 architected registers for most classes of CPU. x86-32 has 8 registers and it's definitely too small. ARM went with 16 registers and it's a good compromise. 32 registers is slightly too many if anything - you end up not needing the last 10 or so.

None of this touches on the extra registers you get for SSE and other vector floating point coprocessors. Those make sense as an extra set because they run independently of the integer core, and don't grow the CPU's complexity exponentially.

like image 168
John Ripley Avatar answered Sep 28 '22 09:09

John Ripley