How does the CPU/assembler know the size of the next instruction?

Tags:

For the sake of example, imagine I was building a virtual machine. I have a byte array and a while loop, how do I know how many bytes to read from the byte array for the next instruction to interpret an Intel-8086-like instruction?

EDIT: (commented)

The CPU reads the opcode at the instruction pointer, with 8086 and CISC you have one byte and two byte instructions. How do i know if the next instruction is F or FF?

EDIT:

Found an answer myself in this piece of text on http://www.swansontec.com/sintel.html

The operation code, or opcode, comes after any optional prefixes. The opcode tells the processor which instruction to execute. In addition, opcodes contain bit fields describing the size and type of operands to expect. The NOT instruction, for example, has the opcode 1111011w. In this opcode, the w bit determines whether the operand is a byte or a word. The OR instruction has the opcode 000010dw. In this opcode, the d bit determines which operands are the source and destination, and the w bit determines the size again. Some instructions have several different opcodes. For example, when OR is used with the accumulator register (AX or EAX) and a constant, it has the special space-saving opcode 0000110w, which eliminates the need for a separate ModR/M byte. From a size-coding perspective, memorizing exact opcode bits is not necessary. Having a general idea of what type of opcodes are available for a particular instruction is more important.

911

asked Aug 03 '14 05:08

Ashley Meah

2 Answers

the cpu simply decodes the instruction. IN the case of 8086 the first byte tells the processor how much more to get. It doesnt have to be the first byte the first byte does have to indicate in some way that you need to get more, that more can indicate you need even more. With 8 bit instruction sets like the x86 family where you start with one byte and then see how much more you need, and also being unaligned, you have to treat the instruction stream as a bytestream in order to decode it.

You should write yourself a very simple instruction set simulator, only a handful of instruction, maybe enough to load a register, add something to it and then loop. extremely educational for what you are trying to understand, and takes maybe a half an hour if that to write.

150

answered Nov 15 '22 12:11

old_timer

TLDR:

The solution is more complex than a fixed size array.

It's all about context, this is why disassembler like IDA have complex algorithms to do this.

Instructions are variable length for x86. But if you know the start of an instruction, you know where THAT INSTRUCTION ends. Because of that, you MAY know where the next one begins. I will explain the exceptions soon. But first, here's an example:

ASM:
mov eax, 0
xor eax, eax

Machine:
b8 00 00 00 00
31 c0

Explanation:

Moving to eax is B8, followed by a 32-bit (4-byte) value to move into eax (as eax is 32 bit). In other words, mov eax, immediate will always be 5 bytes. So if you know you are starting on an instruction (not always a safe assumption), and the byte is B8, you know it is a 5 byte instruction, and that the next instruction SHOULD start 5 bytes later.

Note that both instructions (mov eax, 0 and xor eax, eax) effectively do the same thing, clear eax to 0.

Exception:

Things can get tricky with jumps/calls. It is possible to jump into an address space that is in the "middle of an instruction"... but still execute.

Lets look at:

mov eax, 0x90909090

machine code:

b8 90 90 90 90

If we later had a jmp instruction that jumped into the address of the 3rd byte of the above instruction (in the middle of it somewhere), it would just do 3 NOPs (no operation) and fall to the next instruction after it (not setting eax to 0x90909090). This is because a NOP is a 1-byte instruction made up of 0x90.

answered Nov 15 '22 12:11

XlogicX

Related questions
                            
                                would doing arithmetic operation on a pair of signed and unsigned numbers be legal?
                            
                                Can this be atomically executed?
                            
                                NASM shift operators
                            
                                NASM x86 16-bit addressing modes [duplicate]
                            
                                Simple asm program with yasm in MacOS Mountain Lion
                            
                                Asm CALL instruction - how does it work?
                            
                                Can't link object file using ld - Mac OS X
                            
                                Need guidance on understanding basic assembly
                            
                                Differences between PUSH eax and mov [esp], eax?
                            
                                Why is there three leal instructions for this IA32 assembly code?
                            
                                ARM LDR instruction on PC register
                            
                                6502 assembler random number problems
                            
                                Convert a floating point to an integer with truncation instead of rounding using the x87 FPU
                            
                                Can I do `ret` instruction from code at _start in MacOS? Linux?
                            
                                NASM assembler - How to make sure the function label isn't executed one extra time?
                            
                                C++, an "impossible" behavior
                            
                                Creating a Hello World library function in assembly and calling it from C#
                            
                                Cannot write to ARM register R4: feature or bug?
                            
                                Converting C to nasm assembly
                            
                                Why is CMake ignoring assembly files when building static library?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With