x86 instruction encoding how to choose opcode

Q: What is x86 opcode?

The x86 opcode bytes are 8-bit equivalents of iii field that we discussed in simplified encoding. This provides for up to 512 different instruction classes, although the x86 does not yet use them all.

Tags:

assembly

x86-64

compiler-construction

disassembly

When encode instructioncmpw %ax -5 for x86-64, from Intel-instruction-set-reference-manual, I have two opcodes to choose:

3D iw CMP AX, imm16 I Valid Valid Compare imm16 with AX.
83 /7 ib CMP r/m16, imm8 MI Valid Valid Compare imm8 with r/m16.

So there will be two encoding results:

66 3d fb ff ; this for opcode 3d
66 83 f8 fb ; this for opcode 83

Then which one is better?

I tried some online-disassembler below

https://defuse.ca/online-x86-assembler.htm#disassembly2 https://onlinedisassembler.com/odaweb/

Both can disassemble to origin instruction. But why 6683fb00 also works and 663dfb doesn't.

487

asked Jun 03 '16 09:06

Steve

1 Answers

Both encodings are the same length, so that doesn't help us decide.

However, as @Michael Petch commented, the imm16 encoding will cause an LCP stall in the decoders on Intel CPUs. (Because without the 66 operand-size prefix, it would be 3D imm32, so the operand-size prefix changes the length of the rest of the instruction. This is why it's called a Length-Changing-Prefix stall. AFAIK, you'd get the same stall in 16bit code for using a 32bit immediate.)

The imm8 encoding doesn't cause a problem on any microarchitecture I know of, so favour it. See Agner Fog's microarch.pdf, and other links from the x86 tag wiki.

It can be worth using a longer instruction to avoid an LCP stall. (e.g. if you know the upper 16 bits of the register are zero or sign-extended, using 32bit operand size can avoid the LCP stall.)

Intel SnB-family CPUs have a uop cache, so instructions don't always have to be re-decoded before executing. Still, the uop cache is small, so it's worth it.

Of course, if you're tuning for AMD, then this isn't a factor. I forget if Atom and Silvermont decoders also have LCP stalls.

Re: part2:

663d is prefix+opcode for cmp ax, imm16. 663dfb doesn't "work" because it consumes the first byte of the following instruction. When the decoder see 66 3D, it grabs the next 2 bytes from the instruction stream as the immediate.

166

answered Sep 29 '22 19:09

Peter Cordes

Related questions
                            
                                In the Win64 ABI, can the reserved argument stack space be used for general purpose storage?
                            
                                How to turn a method to a callback procedure in 64bit Delphi XE2?
                            
                                How to compile an assembly file to a raw binary (like DOS .com) format with GNU assembler (as)? [duplicate]
                            
                                How do I disassemble raw MIPS code?
                            
                                Assembly security
                            
                                What does DUnit2's CallerAddr function do, and how do I convert it to 64 bits?
                            
                                What would happen if the CS segment register is changed? (And how would you do so?)
                            
                                GDB: Assembly instruction calculation
                            
                                Linux kernel header.S source, why _end+3 needed when zeroing BSS?
                            
                                What is a NULL in hexadecimal
                            
                                Using Autotools for a project with platform specific source code
                            
                                How to use align-data-move SSE in Delphi XE3?
                            
                                Why does an assembly program only work when linked with crt1.o crti.o and crtn.o?
                            
                                what does label: .word label mean in ARM Assembly
                            
                                Assembly Bootloader Not Jumping to Kernel
                            
                                C function syntax
                            
                                SIMD minmag and maxmag
                            
                                Why can't I use compiler intrinsics in an asm block?
                            
                                Inline assembly in C code using TI code composer studio (for ARM)
                            
                                Why ARM gcc push register r3 and lr into stack at the beginning of a function?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With