Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why didn't Intel made the high order part of their CPUs' registers available?

When programming in assembly and doing some sort of string manipulation I use al, ah and sometimes others to hold characters because this allows me to keep more data in my registers. I think this is a very handy feature, but Intel's engineers seem don't agree with me, because they didn't make the two high order bytes of the registers accessible (or am I wrong?). I don't understand why. I thought about this for a while and my guesses are:

  1. They would make the CPU too complicated
  2. They would be useless
  3. perhaps both of the above

I came up with number two because I've never seen a compiled program (say with gcc) use al or bh or any of them.

like image 231
BlackBear Avatar asked Mar 15 '11 20:03

BlackBear


People also ask

Why don t cpus have more registers?

Because almost every instruction must select 1, 2, or 3 architecturally visible registers, expanding the number of them would increase code size by several bits on each instruction and so reduce code density.

Why do cpus have so few registers?

Mostly it is limited by instruction set design. Every bit that goes into describing registers has to come at the expense of something else, usually address or opcode space. To get more registers, you would have to give up something else. That is a trade-off that the instruction set designer has to make.

How many registers does an Intel CPU have?

Register Renaming From the instruction set perspective, Intel processors have eight general purpose registers in 32-bit mode, and sixteen general purpose registers in 64-bit mode, however, from the internal hardware perspective, Intel processors have many more registers.


4 Answers

Although it's a little clumsy, you can just swap the halves of a register with rol reg,16 (or ror reg,16, if you prefer). On the Netbust CPUs (Pentium IV) that's quite inefficient, but on most newer (or older) CPUs you normally have a barrel shifter to do that in one clock.

As for why they didn't do it, it's pretty simple: they'd need to thoroughly redesign the instruction encoding if they really wanted to do it. In the original design, they used up all the codes that would fit in the sizes of fields they used to specify a register. In fact, they already use something of a hack where the meaning of an encoding depends on the mode, and there are address size and operand size prefixes if you need to use a different size. For example, to use AX when you're running in 32-bit mode, the instruction will have an operand override prefix before the instruction itself. If they'd really wanted to badly enough, they could have extended that concept to specify things like "the byte in bits 16-23 of register X", but it'd make decoding more complex, and decoding x86 instructions is already relatively painful.

like image 90
Jerry Coffin Avatar answered Nov 15 '22 07:11

Jerry Coffin


Short answer is because of how it evolved from 16 bits.

Why is there not a register that contains the higher bytes of EAX?

like image 28
dwidel Avatar answered Nov 15 '22 09:11

dwidel


Beyond the instruction encoding issue that Jerry correctly mentions, there are other things at work here as well.

Most non-trivial CPUs are pipelined: this means that in ordinary operation, instructions begin executing before previous instructions have finished execution. This means that the processor must detect any dependencies of an instruction on earlier instructions and prevent the instruction from executing until the data (or condition flags) on which it depends are available[1].

Having names for different parts of a register complicates this dependency tracking. If I write:

mov  ax,  dx
add  eax, ecx

then the core needs to know that ax is part of eax, and that the add should wait until the result of the move is available. This is called a partial register update; although it seems very simple, hardware designers generally dislike them, and try to avoid needing to track them as much as possible (especially in modern out-of-order processors).

Having names for the high halves of the registers adds an additional set of partial register names that must be tracked, which adds die area and power usage, but delivers little benefit. At the end of the day, this is how CPU design decisions are made: a tradeoff of die area (and power) vs. benefit.

Partial register updates aren't the only thing that would be complicated by having names for the high parts of the register, but it's one of the simplest to explain; there are many other small things that would need to become more complicated in a modern x86 CPU to support it; considered in aggregate, the additional complexity would be substantial.

[1] There are other ways to resolve dependencies, but we ignore them here for simplicity; they introduce similar problems.

like image 37
Stephen Canon Avatar answered Nov 15 '22 08:11

Stephen Canon


To add to what Jerry and Stephen have said so far.

First thoughts are you have to try to be conservative with your opcodes/instruction encoding. Going in it started with ax, ah, and al. Is there a value added when going to eax to provide byte based access to that upper register (beyond the rotates or shifts that are already there to provide that)? Not really. If you are doing byte operations why are you using a 32 bit register and why using the upper bytes? Perhaps optimize the code differently taking advantage of what is available or tolerating what is available and taking advantage in other areas.

I think there is a reason that the majority of the world's instruction sets do not have this four names for the same register thing. And I dont think it is patents that are at play. In its day it was probably a cool feature or design. Probably had its roots in transitioning folks from 8 bit processors into this 8/16 bit thing. Anyway, I think al, ah, ax, eax was bad design and everyone learned from that. As Stephen mentioned you have hardware issues at play, if you were strictly to implement this in direct logic it is a mess, a rats nest of muxes to wire everything up (bad for speed and bad for power), then you get into the timing nightmare Stephen was taking about. But there is a history of microcoding for this instruction set so you are essentially emulating these instructions with some other processor and in the same way it adds to that nightmare. The wise thing to do would have been to re-define ax to be 32 bit and get rid of ah and al. Wise from a design perspective but unwise for portability (good for engineering, bad for marketing, sales, etc). I think the reason why that tired old instruction set is not limited to history books and museums is (among a few other reasons) because of reverse compatibility.

I highly recommend learning a number of other instruction sets, both new and old. msp430, ARM, thumb, mips, 6502, z80, PIC (the old one that isnt a mips), etc. Just to name a few. Seeing the differences and similarities between instruction sets is very educational IMO. And depending on how deep you go into the understanding (variable word length vs fixed length, etc) understanding what choices we available to intel when making this 16 to 32 bit and more recently 32 bit to 64 bit transition, while trying to retain market share.

I think the solution they chose at the time was the right choice, insert a formerly undefined opcode in front of what normally decodes as a 16 bit opcode turning it into a 32 bit opcode. Or sometimes not if there are no immediate values that follow (requiring the knowledge of how many to read). It seemed in line with the instruction set at the time. So it is back to Jerry's answer, the reason is a combination of the design of the 8/16 bit instruction set the history and reasons for expanding it. Granted they could have just as easily used similar encoding to provide access to the upper 16 bits in an ax,ah,al fashion, and they could have just as easily multiplied the four base registers A,B,C,D into 8 or 16 or 32 general purpose registers (A,B,C,D,E,F,G,H,...) while remaining reverse compatible.

like image 42
old_timer Avatar answered Nov 15 '22 07:11

old_timer