Is it possible to decode x86-64 instructions in reverse?

Q: How many instructions does x86 64 have?

al. states that the current x86-64 design “contains 981 unique mnemonics and a total of 3,684 instruction variants” [2].

Q: Do all x86 instructions have the same length?

x86 instructions can be anywhere between 1 and 15 bytes long. The length is defined separately for each instruction, depending on the available modes of operation of the instruction, the number of required operands and more.

Q: How big is the x86 instruction set?

The full x86 instruction set is large and complex (Intel's x86 instruction set manuals comprise over 2900 pages), and we do not cover it all in this guide. For example, there is a 16-bit subset of the x86 instruction set. Using the 16-bit programming model can be quite complex.

2 Answers

The basic format of x86 instructions is like this

x86 instruction format

Modern CPUs can support VEX and EVEX prefixes. In x86-64 there might also be the REX prefix at the beginning

Looking at the format it can easily be seen that the instructions aren't palindromes and you can't read from the end.

Regarding determining which instruction an arbitrary address belongs to, unfortunately it can't be done either, because x86 instructions are not self-synchronizable, and (generally) not aligned. You have to know exactly the begin of an instruction, otherwise the instruction will be decoded differently.

You can even give addresses that actually contain data and the CPU/disassembler will just decode those as code, because no one knows what those bytes actually mean. Jumping into the middle of instructions is often used for code obfuscation. The technique has also been applied for code size saving in the past, because a byte can be reused and has different meanings depending on which instruction it belongs to

Jump into the middle of instruction - in IA-32
Why are disassembled data becoming instructions?
Designing an instruction sequence so that it does something else if decoded with an offset
What is "overlapping instructions" obfuscation?
Jump into the instructions, is it a case objdump can't handle?
What is the reason for this method to call itself?

That said, it might be possible to guess in many cases since functions and loops are often aligned to 16 or 32 bytes, with NOPs padding around

134

answered Oct 19 '22 04:10

phuclv

An x86 instruction stream is not self-synchronizing, and can only be unambiguously decoded forward. You need to know a valid start-point to decode. The last byte of an immediate can be a 0x90 which decodes as a nop, or in general a 4-byte immediate or displacement can have byte-sequences that are valid instructions, or whatever other overlap possibilities with ModRM/SIB bytes looking like opcodes.

If you decode forward in code that isn't intentionally obfuscated, you often get back into sync with the "correct" instruction boundaries, so you might try remembering the instruction boundaries as a known-good point, and check that a decoding from a backwards-step candidate start address has an instruction boundary at your known-good point.

IDK if you could get more clever about finding more known-good points going backwards which further candidates also have to agree with.

Be sure to highlight backwards-decoded instructions for the user in red or gray or something, so they know it's not guaranteed reliable.

Another alternative is to require function symbols (extern functions, or any function with debug info).

GDB doesn't allow you to scroll upward (in layout reg mode), unless you're inside a function that it knows the start address. Then I guess it decodes from the function start address so it knows instruction boundaries when it gets to the part that fits in the window.

If you want to go backwards, you have to disas 0x12345, +16 to start decoding from there. Then you can scroll down, but if you get the insn boundary wrong you get garbage.

answered Oct 19 '22 04:10

Peter Cordes

Related questions
                            
                                I'm not exactly sure what this x86 Add instruction is doing
                            
                                128-bit shifts using assembly language?
                            
                                Intel X86-64 assembly tutorials or book [closed]
                            
                                make: Circular dependency dropped
                            
                                Optimal SIMD algorithm to rotate or transpose an array
                            
                                What is the meaning of the data32 data32 nopw %cs:0x0(%rax,%rax,1) instruction in gcc inline asm?
                            
                                How Does BIOS initialize DRAM?
                            
                                Long long in c99
                            
                                Executing a flat binary file under Linux
                            
                                When should carry flag be set in assembly language
                            
                                accessing array's element in assembly language (windows)
                            
                                Looking for a fast polygon rendering algorithm
                            
                                How to convert a positive number to negative in assembly
                            
                                call subroutines conditionally in assembly
                            
                                Why do assembly programs have separate segments?
                            
                                Assembly Square Brackets
                            
                                Wrong gcc generated assembly ordering, results in performance hit
                            
                                Near call/jump tables don't always work in a bootloader
                            
                                sqrt of uint64_t vs. int64_t
                            
                                MOVZX missing 32 bit register to 64 bit register

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Is it possible to decode x86-64 instructions in reverse?

Tags:

x86

assembly

x86-64

machine-code

disassembly

user3696459

People also ask

2 Answers

phuclv

Peter Cordes

Recent Activity

Donate For Us