Instruction decoding when instructions are length-variable

Tags:

Here are some instructions and their corresponding encodings:

55                      push   %ebp
89 e5                   mov    %esp,%ebp
83 ec 18                sub    $0x18,%esp
a1 0c 9f 04 08          mov    0x8049f0c,%eax
85 c0                   test   %eax,%eax
74 12                   je     80484b1 <frame_dummy+0x21>
b8 00 00 00 00          mov    $0x0,%eax
85 c0                   test   %eax,%eax
74 09                   je     80484b1 <frame_dummy+0x21>
c7 04 24 0c 9f 04 08    movl   $0x8049f0c,(%esp)

Today's microprocessors are often 32- or 64-bit and I guess that they usually read data from memory in 4 byte or 8 byte chunks. However, instructions can have variable length. How does a microprocessor decode these instructions and why are they not of constant length to ease implementation?

700

asked Nov 20 '11 19:11

scdmb

2 Answers

There are very good reasons to have a fixed instruction length, implementation simplicity being the most important. That's why many processors do indeed have a fixed instruction length, like RISC processors and many early computers.

CISC instruction sets like x86 are desinged to be decoded sequentially (step by step) by microcode. (you can think of microcode as a kind of interpreter for CISC instructions) That was the state of the art in the early 80s when x86 was designed.

Nowadays this is a problem, because microcode is dead. x86 instructions are now broken into smaller µ-ops, not unlike RISC instructions. But to do so, the x86 instructions have to be decoded first. And current CPUs decode up to 4 instructions each cycle. Because there is no time to decode sequentially one instruction after another, this works simply by brute force. When a line is brought in from the instruction cache, many decoders decode the line in parallel. One instruction decoder at each possible byte offset. After the deocoding, the length of each instruction is known, and the processor decides which decoders actually provide valid instructions. This is wasteful, but very fast.

Variable instruction sizes introduce more headakes, e.g. an instruction can span two cache lines or even two pages in memory. So your observation is spot on. Today nobody would design a CISC instruction set like x86. However, some RISCs have recently introduced a second instruction size to get more compact code: MIPS16, ARM-Thumb, etc.

163

answered Oct 17 '22 09:10

Mackie Messer

EDIT: hoping to make it more readable.

The hardware does not look at the memory as a long list of unorganized bytes. All processors, fixed or variable word length, have a specific boot method. Usually a known address in the processors memory/address space with either an address to the first instruction of the boot code or the first instruction itself. From there and for every instruction the address of the current instruction is where to start decoding.

For an x86 for example it has to look at the first byte. Depending on the decoding of that byte it may need to read more opcode bytes. If the instruction required an address, offset or otherwise some form of immediate value those bytes are there as well. Very quickly the processor knows exactly how many bytes are in this instruction. If the decoding shows the instruction contains 5 bytes and it started at address 0x10 then the next instruction is at 0x10+5 or 0x15. This continues forever. Unconditional branches, which depending on the processor can come in various flavors, you dont assume the bytes that follow the instruction are another instruction. Branches, conditional or unconditional give you a clue where another instruction or series of instructions start in memory.

Note the X86 today definitely does not fetch one byte at a time when it decodes an instruction, sensible sized reads occur, probably 64 bits at a time, and the processor will pull the bytes out of that as needed. When reading a single byte from a modern processor the memory bus still does a full sized read and either presents all of those bits on the bus where the memory controller only pulls the bits it was after or it may go so far as to keep that data. You will see some processors where you may have two 32 bit read instructions at back to back addresses but only one 64 bit read happens on the memory interface.

I highly recommend you write a disassembler and/or an emulator. For fixed length instructions it is pretty easy, you just start at the beginning and decode as you go through memory. A fixed word length disassembler may help learning about decoding instructions, which is part of this process, but it wont help your understanding of following variable word length instructions and how to separate them without getting out of alignment.

The MSP430 is a good choice as a first disassembler. There are gnu tools asm and C, etc (and llvm for that matter). Start with assembler then C or take some pre-made binaries. They key is you have to walk the code like the processor, start with the reset vector and walk your way through. When you decode one instruction you know its length and know where the next instruction is until you hit an unconditional branch. Unless the programmer has intentionally left a trap to fool the disassembler, assume that all branches conditional or unconditional point at valid instructions. An afternoon or evening is all that is required to bang out the whole thing or at least get the concept. You dont necessarily need to completely decode the instruction, dont have to make this a full blown disassembler, only need to decode enough to determine the length of the instruction and determine if it is a branch and if so where. Being a 16 bit instruction you can, if you choose, one time build a table of all possible instruction bit combinations and what their lengths are, that may save some time. You still have to decode your way through branches.

Some folks might use recursion, instead I use a memory map showing me what bytes are the start of an instruction, what bytes/words are part of an instruction but not the first byte/word and what bytes I have not decoded yet. I start by taking the interrupt and reset vectors and using those to mark the starting point for instructions. and then go into a loop that decodes the instructions looking for more starting points. If a pass happens with no other starting points then I have finished that phase. If at any point I find an instruction starting point that lands in the middle of an instruction there is a problem that will require human intervention to solve. Disassembling old video game roms for example you are likely to see this, hand written assembler. Compiler generated instructions tend to be very clean and predicable. If I get through this with a clean memory map of instructions and what is left over, (assume data) I can make one pass knowing where instructions are and decode and print those out. What a disassembler for variable word length instructions sets can never do is find every instruction. If the instruction set has for example a jump table or otherwise some sort of runtime computed address for executing, you wont find all of those without actually executing the code.

There are a number of existing emulators and disassemblers out there, if you want to try to follow along instead of writing your own, I have a few myself http://github.com/dwelch67.

There are pros and cons for and against variable and fixed word length. Fixed has advantages sure, easy to read easy to decode, everything is nice and proper, but think about ram, the cache in particular, you can cram a lot more x86 instructions in the same cache as an ARM. On the other hand an ARM can decode much easier, far less logic, power, etc more bang for your buck. Historically memory was expensive logic was expensive, and a byte as you go was how it worked. a single byte opcode limited you to 256 instructions, so that expanded into some opcodes needing more bytes not to mention the immediates and addresses that made it variable word length anyway. Keep reverse compatibility for decades and you end up where you are now.

To add to all of this confusion ARM for example now has a variable word length instruction set. Thumb had a single variable word instruction, the branch, but you can easily decode this as fixed length. But they created thumb2 which really does resemble a variable word length instruction set. Also many/most of the processors that support the 32 bit ARM istructions also support 16 bit thumb instructions, so even with an ARM processor you cannot simply align the data by words and decode as you go, you have to use variable word length. What is worse the ARM to/from thumb transitions are decoded by executing, you normally cannot simply disassemble and figure out arm from thumb. A mixed mode compiler generated branch often involves loading a register with the address to branch to then using a bx instruction to branch to it, so the disassembler would need to look at the bx, look backwards in execution for the register used in the branch and hope you find a load there and hope that is the .text segment it is loading from.

answered Oct 17 '22 07:10

old_timer

Related questions
                            
                                What does qword ptr [hexvalue] mean without a base register
                            
                                Atomic operations, std::atomic<> and ordering of writes
                            
                                x86 assembly: Pop a value without storing it
                            
                                Is it possible to multiply by an immediate with mul in x86 Assembly?
                            
                                Why is there no "sub rsp" instruction in this function prologue and why are function parameters stored at negative rbp offsets?
                            
                                What is meaning of .model small in 8086 programs?
                            
                                What is faster: JMP or string of NOPs?
                            
                                What exactly does <puts@plt> mean?
                            
                                Could someone please explain what this inline #define assembly is doing?
                            
                                How to write multiline inline assembly code in GCC C++?
                            
                                Self modifying code always segmentation faults on Linux
                            
                                Why is this SIMD multiplication not faster than non-SIMD multiplication?
                            
                                error LNK2001: unresolved external symbol _MessageBox
                            
                                Displaying each assembly instruction executed in gdb
                            
                                Does a compiler always produce an assembly code?
                            
                                Is it possible to access the overflow flag register in a CPU with C++?
                            
                                What is the ".s" suffix in x86 instructions?
                            
                                Calling a non-exported function in a DLL
                            
                                How is concurrency done in Intel x86 assembly?
                            
                                What is the fastest way to test if a double number is integer (in modern intel X86 processors)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Instruction decoding when instructions are length-variable

Tags:

cpu-architecture

assembly

microprocessors

scdmb

People also ask

2 Answers

Mackie Messer

old_timer

Recent Activity

Donate For Us