Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Which is more useful at an assembly level, 64 registers or three operand instructions? [closed]

This question is in the context of writing a C compiler for a 16 bit homebrew CPU.

I have 12 bits of operand for ALU instructions (such as ADD, SUB, AND, etc.).

I could give instructions three operands from 16 registers or two operands from 64 registers.

e.g.

SUB A <- B - C  (registers r0-r15)

vs

SUB A <- A - B  (registers r0-r63)

Are sixteen registers, with three-operand instructions, more useful than 64 registers with two-operand instructions, to C compilers and their authors?

like image 416
fadedbee Avatar asked May 17 '16 11:05

fadedbee


1 Answers

16 registers with non-destructive 3-operand instructions is probably better.

However, you should also consider doing something else interesting with those instruction bits. For homebrew, you probably don't care about reserving any for future extensions, and don't want to add a ton of extra opcodes (like PPC does).

ARM takes the interesting approach of having one operand to each instruction go through the barrel shifter, so every instruction is a "shift-and-whatever" instruction for free. This is supported even in "thumb" mode, where the most common instructions are only 16 bits. (ARM mode has the traditional RISC 32bit fixed instruction size. It dedicates 4 of those bits to predicated execution for every instruction.)


I remember seeing a study on the perf gains from doubling the number of registers in a theoretical architecture, for SPECint or something. 8->16 was maybe 5 or 10%, 16->32 was only a couple %, and 32->64 was even smaller.

So 16 integer registers is "enough" most of the time, unless you're working with int32_t a lot, since each such value will take two 16 bit registers. x86-64 only has 16 GP registers, and most functions can keep a lot of their state live in registers pretty comfortably. Even in loops that make function calls, there are enough call-preserved registers in the ABI that spill/reload often doesn't have to happen in the loop.

The gains in code size and instruction count from 3-operand instructions will be bigger than from saving the occasional spill / reload. gcc output has to mov all the time, and use lea as a non-destructive add / shift.


If you want to optimize your CPU for software-pipelining to hide memory load latency (which is simpler than full out-of-order execution), more registers are great, esp. if you don't have register renaming. However, I'm not sure how good compilers are at static instruction scheduling. It's not a hot topic anymore, since all high performance CPUs are out-of-order. (OTOH, a lot of software that people actually use is running on in-order ARM CPUs in smartphones.) I don't have experience trying to get compilers to optimize for in-order CPUs, so IDK how viable it is to depend on that.

If your CPU is so simple that it can't do anything else while a load is in-flight, this probably doesn't matter. (This is getting really hand-wavy because I don't know enough about what's practical for a simple design. Even "simple" in-order modern CPUs are pipelined.)


64 registers is getting into "too many" territory, where saving/restoring them takes a lot of code. The amount of memory is probably still negligible, but since you can't loop over registers, you'd need 64 instructions.


If you're designing an ISA from scratch, have a look at Agner Fog's CRISC proposal and the resulting discussion. Your goals are very different (high performance / power budget 64bit CPU vs. simple 16 bit), so your ISAs will of course be very different. However the discussion may get you to think of things you hadn't considered, or ideas you want to try.

like image 184
Peter Cordes Avatar answered Sep 26 '22 17:09

Peter Cordes