Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Referencing registers in machine code

I am looking at some assembly code and the corresponding memory dump and I am having trouble understanding what is going on. I'm using this as reference for opcodes for x86 and this as reference for registers in x86. I ran into these commands and I realized I am still missing a big piece of the puzzle.

8B 45 F8       - mov eax,[ebp-08] 
8B 80 78040000 - mov eax,[eax+00000478]
8B 00          - mov eax,[eax]

Basically I don't understand what the two bytes after the opcode mean and I can't find anywhere that gives a bit-by-bit format for the commands (if anyone could point me to one it would be much appreciated).

How does the CPU know how long each of these commands are?

According to my reference this 8B mov command allows the use of the 32b or 16b registers, meaning there are 16 possible registers (AX, CX, DX, BX, SP, BP, SI, DI, and their extended equivalents). That means you need a whole byte to specify which register to use in each operand.

Still fine so far, the two bytes after the opcode could specify which registers to use. Then I noticed that these commands are stacked byte to byte in the memory and all three of them use a different amount of bytes to specify the offset to be used when dereferencing the second operand.

I suppose you could limit the registers to only be able to use 16b with 16b and 32b with 32b, but that would only free up a single bit, not enough to tell the CPU how many bytes the offset is.

What values correspond to which registers?

The second thing that bothers me is that though my reference explicitly numbers the registers I do not see any correlation with the bytes after the opcode in these commands. These commands don't seem to be consistent even with themselves. The second and third commands are both going from eax to eax, but there is a bit midway through the first byte that is different.

Following my reference I would assume 0 is EAX, 1 is ECX, 2 is EDX, and so on. This doesn't, however, offer me any insight into how you would specify between RAX, EAX, AX, AL, and AH. Some of the commands seem to only accept 8b registers, while others take 16b or 32b, and on x86_64 some seem to take 16b, 32b, or 64b registers. So would you just do something like 0-7 are the R's, 8-15 the E's, 16-23 non-extended, and 24-31 the H's and L's? Even if it is something like that it seems like it should be a lot easier to find a manual or something specifying that.

like image 716
SoggyPancakes Avatar asked Aug 05 '17 20:08

SoggyPancakes


People also ask

How are registers represented in machine code?

Each register is one byte (8 bits) long. For identifying registers within instructions, each register is assigned the unique four-bit pattern that represents its register number. This, register 0 is identified by 0000 (hexadecimal 0), and register 4 is identified by 0100 (hexadecimal 4).

What are EAX EBX ECX EDX registers?

The EAX, EBX, ECX, EDX, EBP, EDI, and ESI registers are all 32-bit general-purpose registers, used for temporary data storage and memory access. Some of CPU instructions modify specific registers.

What is a register in machine language?

A processor register (CPU register) is one of a small set of data holding places that are part of the computer processor. A register may hold an instruction, a storage address, or any kind of data (such as a bit sequence or individual characters). Some instructions specify registers as part of the instruction.

What are registers in assembly code?

A register is a part of the processor that can hold a bit pattern. On the MIPS, a register holds 32 bits. There are many registers in the processor, but only some of them are visible in assembly language. The others are used by the processor in carrying out its operations.


1 Answers

The first byte after the opcode is the ModR/M byte. The first reference you linked contains tables for the ModR/M byte toward the end of the page. For a memory access instruction such as these, the ModR/M byte indicates the register being loaded or stored and the addressing mode to use for the memory access.

The byte(s) that follow the ModR/M byte are dependent on the value of the ModR/M byte.

In the instruction "mov eax, [ebp-8]", the ModR/M byte is 45. From the table for 32-bit ModR/M Byte, this means Reg is eax and Effective Address is [EBP]+disp8. The next byte of the instruction, F8, is the 8-bit signed offset.

The operand size of the instruction can be implicit in the instruction or it can be specified by an instruction prefix. For example, the 66 prefix would indicate 16-bit operands, for a mov instruction such as those in your examples. The 48 prefix would indicate 64-bit operands, if you're using 64-bit mode.

8-bit operands are usually indicated by the low bit of the instruction. If you change the instruction in your example from 8B to 8A, it becomes an 8-bit move into al.

like image 107
prl Avatar answered Nov 22 '22 19:11

prl