Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How much faster are register based architectures than stack architectures?

Studying compilers course, I am left wondering why use registers at all. It is often the case that the caller or callee must save the register value and then restore it.

In a way they always end up using the stack anyway. Is creating additional complexity by using registers really worth it?

Excuse my ignorance.

Update: Please, I know that registers are faster than RAM and other types of cache. My main concern is that one has to "save" the value that is in the register and the "restore" it to the register afterwards. In both cases we are accessing some kind of cache. Would it not be better to use cache in the first place?

like image 385
Andriy Drozdyuk Avatar asked Mar 11 '10 16:03

Andriy Drozdyuk


People also ask

Are registers faster than the stack?

It has been said that register machines are more efficient than stack machines because register machines can be pipelined for speed while stack machines cannot.

Why register based machine is better than stack based machine?

Under non-JIT settings, a stack-based VM will be popping and then pushing the same operands many times, while a register-based VM will simply allocate the right amount of registers and operate on them, which can significantly reduce the amount of operations and CPU time.

Why are registers the fastest?

Registers are essentially internal CPU memory. So accesses to registers are easier and quicker than any other kind of memory accesses. Save this answer.

What is the difference between stack and register?

Stack machines have higher code density. In contrast to common stack machine instructions which can easily fit in 6 bits or less, register machines require two or three register-number fields per ALU instruction to select operands; the densest register machines average about 16 bits per instruction plus the operands.


4 Answers

In the speed/latency hierarchy, registers are fastest (usually zero cycle latency), L1 cache is next (typically 1 or more cycles of latency), and then it goes downhill rapidly after that. So in general register accesses are "free" whereas there is always some cost involved in memory accesses, even when that access is cached.

Saving and restoring registers typically only happens (a) at the begin/end of a function call or context switch, or (b) when the compiler runs out of registers for temporary variables and needs to "spill" one or more registers back to memory. In general, well-optimised code will keep the majority of frequently accessed ("hot") variables in registers, at least within the innermost loop(s) of a function.

like image 168
Paul R Avatar answered Oct 20 '22 14:10

Paul R


I'd say it's not really an issue with compilers as it is with CPUs. Compilers have to work with the target architecture.

Here's what the other answers are glossing over: it depends on the architecture of the CPU at the level of the actual circuitry. Machine instructions boil down to get data from somewhere, modify the data, load or goto the next instruction.

Analogy

Think of the problem like a woodworker working on building or repairing a chair for you. His questions will be "Where is the chair", and, "What needs to be done to the chair". He might be able to fix it at your house or he might need to take the chair back to his shop to work on it. Either way will work but depends on how prepared he is to work outside of a fixed location. It could slow him down or it could be his specialty.

Now, back to the CPU.

Explanation

Regardless of how parallel a CPU may be, like having several adders or instruction decode pipelines, those circuits are located in specific locations on the chip and the data must be loaded into the places where the operation can be performed. The program is responsible for moving the data into and out of those locations. In a stack-based machine, it might provide instructions that modify data directly but it may be doing housekeeping in the microcode. An adder works the same way regardless whether the data came from the stack or from the heap. The difference is in the programming model available to the programmer. Registers are basically a defined place to work on data.

like image 25
Kelly S. French Avatar answered Oct 20 '22 15:10

Kelly S. French


Well, well it seems the answer to this was also in the book (modern compiler implementation in java). The book presents 4 answers:

  1. Some procedures don't call other procedures. If you draw the diagram of procedure calls, and assume that each procedure calls on average 1-2 other procedures, you come up with a tree, in which the "leafs" (the procedures that don't call others) outnumber the tree non-left nodes. So you win that way. Some compilers don't allocate a stack frame at all for these leaf nodes.
  2. Some optimizing compilers use "interprocedural register allocation" - which basically means they analyse all of your source code and make smart ways of storing arguments to procedures ahead of time, thus minimizing writing to stack.
  3. Some procedures are done with a variable before they call another function - in which case that register can be just overwritten.
  4. Some architectures use "register windows", so that each function invocation can allocate fresh set of registers without memory traffic.
like image 28
Andriy Drozdyuk Avatar answered Oct 20 '22 16:10

Andriy Drozdyuk


Accessing RAM is generally MUCH slower than accessing a register both in terms of latency and bandwidth. There are CPUs that have hardware stack of limited size - this allows pushing registers to the stack and popping them back - but they still use registers directly for calculations. Working with a pure stack machine (of which there are many academic examples) is rather difficult too, adding more complexity.

like image 35
Tronic Avatar answered Oct 20 '22 14:10

Tronic