Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

same assembly instruction but different machine instruction

I'm playing with x86 ISA,when I tried to use nasm convert some assembly instructions to machine instructions, I found something interesting.

mov [0x3412],al 
mov [0x3412], bl
mov [0x3412], cl
mov [0x3412], dl

1 00000000 A21234                  mov [0x3412], al
2 00000003 881E1234                mov [0x3412], bl
3 00000007 880E1234                mov [0x3412], cl
4 0000000B 88161234                mov [0x3412], dl

As you can see, mov [0x3412], al is an exception to the rule. Also, I found mov [0x3412], al is mapping to two different machine instruction.

root@localhost:~/asm$ ndisasm 123
00000000  88061234          mov [0x3412],al
00000004  A21234            mov [0x3412],al

Besides this special instruction, is there any other assembly instruction mapping to more than one machine instructions in x86?

like image 275
Huihoo Avatar asked Sep 08 '15 05:09

Huihoo


People also ask

Do all machine instructions have the same fields?

A. No. Different types of machine instructions are made of different fields.

What are different types of instructions in assembly language?

A typical assembly language consists of 3 types of instruction statements that are used to define program operations: Opcode mnemonics. Data definitions. Assembly directives.

Is assembly language different for each processor?

In assembly, human-readable mnemonics replace the binary numbers of the machine language. Because each processor type's instruction set is unique, assembly languages are necessarily different among processor types. In the early days of computing, almost everyone programmed in assembly.


Video Answer


1 Answers

What you are observing is an artifact of one of the design considerations that Intel made with the 8088 processor. To remain compatible with the 8088 processor, today's x86 based processors carry forward some of those design consideration especially as it relates to the instruction set. In particular Intel decided that the 8088 should be more efficient with memory utilization at the cost of performance. They created a variable length CISC instruction set that has some special encodings to limit the size of some instructions. This differs from many RISC based architectures (like the older Motorola 88000) that used fixed length instructions but could achieve better performance.

The trade off between speed and a variable or fixed length instruction set was because it required more time for the processor to decode the complex variable length instructions that are used to achieve some of the smaller instruction encodings. This was true for the Intel 8088.

In older literature (Circa 1980) the considerations for achieving better utilization of space was much more prominent. The information in my answer as it relates to the AX register comes from a book on my shelf titled 8088 Assembler Language Programming: The IBM PC, however some of the information can be found in online articles like this.

From the online article this information is very applicable to the situation with the AX (accumulator) and other general purpose register like BX, CX, DX.

AX is the "accumulator'';

some of the operations, such as MUL and DIV, require that one of the operands be in the accumulator. Some other operations, such as ADD and SUB, may be applied to any of the registers (that is, any of the eight general- and special-purpose registers) but are more efficient when working with the accumulator.

BX is the "base'' register;

it is the only general-purpose register which may be used for indirect addressing. For example, the instruction MOV [BX], AX causes the contents of AX to be stored in the memory location whose address is given in BX.

CX is the "count'' register.

The looping instructions (LOOP, LOOPE, and LOOPNE), the shift and rotate instructions (RCL, RCR, ROL, ROR, SHL, SHR, and SAR), and the string instructions (with the prefixes REP, REPE, and REPNE) all use the count register to determine how many times they will repeat.

DX is the "data'' register;

it is used together with AX for the word-size MUL and DIV operations, and it can also hold the port number for the IN and OUT instructions, but it is mostly available as a convenient place to store data, as are all of the other general-purpose registers.

As you can see Intel intended the general purpose registers to be used for a variety of things, however they also could be used for specific purposes and often had special meaning for the instructions they were associated with. In your case you are observing the fact that AX is considered as an Accumulator. Intel took that into consideration and for a number of instructions added special opcodes to more efficiently store a complete instruction. You found this with the MOV instruction(with AX, AL), but it also applies to ADC, ADD, AND, CMP, OR, SBB, SUB, TEST, XOR. Each one of these instructions has a shorter opcode encoding when used with AL, AX that requires one byte less. You can alternatively encode AX, AL with the longer opcodes as well. In your case:

00000000  88061234          mov [0x3412],al
00000004  A21234            mov [0x3412],al

Are the same instruction but with two different encodings.

This is a good HTML x86 instruction set reference that is available online, however Intel provides a very detailed instruction reference for IA-32(i386 etc) and 64 bit architectures.

like image 143
Michael Petch Avatar answered Sep 29 '22 23:09

Michael Petch