Here are some instructions and their corresponding encodings:
55 push %ebp
89 e5 mov %esp,%ebp
83 ec 18 sub $0x18,%esp
a1 0c 9f 04 08 mov 0x8049f0c,%eax
85 c0 test %eax,%eax
74 12 je 80484b1 <frame_dummy+0x21>
b8 00 00 00 00 mov $0x0,%eax
85 c0 test %eax,%eax
74 09 je 80484b1 <frame_dummy+0x21>
c7 04 24 0c 9f 04 08 movl $0x8049f0c,(%esp)
Today's microprocessors are often 32- or 64-bit and I guess that they usually read data from memory in 4 byte or 8 byte chunks. However, instructions can have variable length. How does a microprocessor decode these instructions and why are they not of constant length to ease implementation?
x86 instructions are variable-length, which means that common instructions typically have a shorter encoding and so take up less space in instruction cache. Therefore, x86 chips need smaller instruction caches for the same performance.
Instructions can be formatted in two ways: o Fixed length: Wastes space but is fast and results in better performance when instruction-level pipelining is used. o Variable length: More complex to decode but saves storage space. The most common instruction formats include zero, one, two, or three operands.
Decode Instruction Type - At Clock Pulse ( T - 2 ) The control unit of the CPU has to first decode the type of the instruction. The three types of instruction includes memory reference instruction , register reference instruction or input / output instruction.
The instruction decoder logic converts the op-code bits into settings for all the internal control lines. The operand provides a literal, file register address or program address, which will be used by the instruction.
There are very good reasons to have a fixed instruction length, implementation simplicity being the most important. That's why many processors do indeed have a fixed instruction length, like RISC processors and many early computers.
CISC instruction sets like x86 are desinged to be decoded sequentially (step by step) by microcode. (you can think of microcode as a kind of interpreter for CISC instructions) That was the state of the art in the early 80s when x86 was designed.
Nowadays this is a problem, because microcode is dead. x86 instructions are now broken into smaller µ-ops, not unlike RISC instructions. But to do so, the x86 instructions have to be decoded first. And current CPUs decode up to 4 instructions each cycle. Because there is no time to decode sequentially one instruction after another, this works simply by brute force. When a line is brought in from the instruction cache, many decoders decode the line in parallel. One instruction decoder at each possible byte offset. After the deocoding, the length of each instruction is known, and the processor decides which decoders actually provide valid instructions. This is wasteful, but very fast.
Variable instruction sizes introduce more headakes, e.g. an instruction can span two cache lines or even two pages in memory. So your observation is spot on. Today nobody would design a CISC instruction set like x86. However, some RISCs have recently introduced a second instruction size to get more compact code: MIPS16, ARM-Thumb, etc.
EDIT: hoping to make it more readable.
The hardware does not look at the memory as a long list of unorganized bytes. All processors, fixed or variable word length, have a specific boot method. Usually a known address in the processors memory/address space with either an address to the first instruction of the boot code or the first instruction itself. From there and for every instruction the address of the current instruction is where to start decoding.
For an x86 for example it has to look at the first byte. Depending on the decoding of that byte it may need to read more opcode bytes. If the instruction required an address, offset or otherwise some form of immediate value those bytes are there as well. Very quickly the processor knows exactly how many bytes are in this instruction. If the decoding shows the instruction contains 5 bytes and it started at address 0x10 then the next instruction is at 0x10+5 or 0x15. This continues forever. Unconditional branches, which depending on the processor can come in various flavors, you dont assume the bytes that follow the instruction are another instruction. Branches, conditional or unconditional give you a clue where another instruction or series of instructions start in memory.
Note the X86 today definitely does not fetch one byte at a time when it decodes an instruction, sensible sized reads occur, probably 64 bits at a time, and the processor will pull the bytes out of that as needed. When reading a single byte from a modern processor the memory bus still does a full sized read and either presents all of those bits on the bus where the memory controller only pulls the bits it was after or it may go so far as to keep that data. You will see some processors where you may have two 32 bit read instructions at back to back addresses but only one 64 bit read happens on the memory interface.
I highly recommend you write a disassembler and/or an emulator. For fixed length instructions it is pretty easy, you just start at the beginning and decode as you go through memory. A fixed word length disassembler may help learning about decoding instructions, which is part of this process, but it wont help your understanding of following variable word length instructions and how to separate them without getting out of alignment.
The MSP430 is a good choice as a first disassembler. There are gnu tools asm and C, etc (and llvm for that matter). Start with assembler then C or take some pre-made binaries. They key is you have to walk the code like the processor, start with the reset vector and walk your way through. When you decode one instruction you know its length and know where the next instruction is until you hit an unconditional branch. Unless the programmer has intentionally left a trap to fool the disassembler, assume that all branches conditional or unconditional point at valid instructions. An afternoon or evening is all that is required to bang out the whole thing or at least get the concept. You dont necessarily need to completely decode the instruction, dont have to make this a full blown disassembler, only need to decode enough to determine the length of the instruction and determine if it is a branch and if so where. Being a 16 bit instruction you can, if you choose, one time build a table of all possible instruction bit combinations and what their lengths are, that may save some time. You still have to decode your way through branches.
Some folks might use recursion, instead I use a memory map showing me what bytes are the start of an instruction, what bytes/words are part of an instruction but not the first byte/word and what bytes I have not decoded yet. I start by taking the interrupt and reset vectors and using those to mark the starting point for instructions. and then go into a loop that decodes the instructions looking for more starting points. If a pass happens with no other starting points then I have finished that phase. If at any point I find an instruction starting point that lands in the middle of an instruction there is a problem that will require human intervention to solve. Disassembling old video game roms for example you are likely to see this, hand written assembler. Compiler generated instructions tend to be very clean and predicable. If I get through this with a clean memory map of instructions and what is left over, (assume data) I can make one pass knowing where instructions are and decode and print those out. What a disassembler for variable word length instructions sets can never do is find every instruction. If the instruction set has for example a jump table or otherwise some sort of runtime computed address for executing, you wont find all of those without actually executing the code.
There are a number of existing emulators and disassemblers out there, if you want to try to follow along instead of writing your own, I have a few myself http://github.com/dwelch67.
There are pros and cons for and against variable and fixed word length. Fixed has advantages sure, easy to read easy to decode, everything is nice and proper, but think about ram, the cache in particular, you can cram a lot more x86 instructions in the same cache as an ARM. On the other hand an ARM can decode much easier, far less logic, power, etc more bang for your buck. Historically memory was expensive logic was expensive, and a byte as you go was how it worked. a single byte opcode limited you to 256 instructions, so that expanded into some opcodes needing more bytes not to mention the immediates and addresses that made it variable word length anyway. Keep reverse compatibility for decades and you end up where you are now.
To add to all of this confusion ARM for example now has a variable word length instruction set. Thumb had a single variable word instruction, the branch, but you can easily decode this as fixed length. But they created thumb2 which really does resemble a variable word length instruction set. Also many/most of the processors that support the 32 bit ARM istructions also support 16 bit thumb instructions, so even with an ARM processor you cannot simply align the data by words and decode as you go, you have to use variable word length. What is worse the ARM to/from thumb transitions are decoded by executing, you normally cannot simply disassemble and figure out arm from thumb. A mixed mode compiler generated branch often involves loading a register with the address to branch to then using a bx instruction to branch to it, so the disassembler would need to look at the bx, look backwards in execution for the register used in the branch and hope you find a load there and hope that is the .text segment it is loading from.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With