Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PE file opcodes

I'm just in the process of writing a PE file parser and I've reached the point where I'd like to parse and interpret the actual code within PE files, which I'm assuming are stored as x86 opcodes.

As an example, each of the exports within a DLL point to RVAs (Relative Virtual Offsets) of where the function will be stored within memory, and I've written a function to convert these RVAs to physical file offsets.

The question is, are these really opcodes, or are they something else?

Does it depend on the compiler/linker as to how the functions are stored within the file, or are they one or two byte X86 opcodes.

As an example, the Windows 7 DLL 'BWContextHandler.dll' contains four functions that are loaded into memory, making them available within the system. The first exported function is 'DllCanUnloadNow', and it is located at offset 0x245D within the file. The first four bytes of this data are: 0xA1 0x5C 0xF1 0xF2

So are these one or two byte opcodes, or are they something else entirely?

If anyone can provide any information on how to examine these, it would be appreciated.

Thanks!

After a bit of further reading, and running the file through the demo version of IDA, I think I'm correct in saying that the first byte 0xA1, is a one byte opcode, meaning mov eax. I got that from here: http://ref.x86asm.net/geek32.html#xA1 and I'm assuming it is correct for the time being.

However, I'm a bit confused as to how the bytes following comprise the rest of the instruction. From the x86 assembler that I know, a move instruction requires two parameters, the destination and the source, so the instruction is to move (something) into the eax register, and I'm assuming that the something comes in the following bytes. However I don't know how to read that information yet :)

like image 556
Tony Avatar asked Dec 07 '12 13:12

Tony


People also ask

What is PE file in malware analysis?

The PE file format is a data structure that contains the information necessary for the Windows OS loader to manage the wrapped executable code. Nearly every file with executable code that is loaded by Windows is in the PE file format, though some legacy file formats do appear on rare occasion in malware.

What is a Microsoft PE file?

The Portable Executable (PE) format is a file format for executables, object code, DLLs and others used in 32-bit and 64-bit versions of Windows operating systems. The PE format is a data structure that encapsulates the information necessary for the Windows OS loader to manage the wrapped executable code.

What is PE and non PE files?

dot) is NON-PE. This means the file is a file which does not contain a portable executable header i.e. . dot extension. Webroot is currently only capable of PE malware detection, however the program also contains a heuristics engine for some NON-PE files.

What is a PE image?

This specification describes the structure of executable (image) files and object files under the Windows family of operating systems. These files are referred to as Portable Executable (PE) and Common Object File Format (COFF) files, respectively.


1 Answers

x86 encoding is complex multi-byte encoding and you can't simply find a single line in instruction table to decode it as it was in RISC (MIPS/SPARC/DLX). There can be even 16-byte encodings of one instruction: 1-3 byte opcode + several prefixes (including multibyte VEX) + several fields to encode immediate or memory address, offset, scaling (imm, ModR/M and SIB; moffs). And there are sometimes tens opcodes for single mnemonic. And more, for several cases there are two encoding possible of the same asm line ("inc eax" = 0x40 and = 0xff 0xc0).

one byte opcode, meaning mov eax. I got that from here: http://ref.x86asm.net/geek32.html#xA1 and I'm assuming it is correct for the time being.

Let's take a view on the table:

po ; flds ; mnemonic ; op1 ; op2 ; grp1 ; grp2 ; Description

A1 ; W ; MOV ; eAX ; Ov ; gen ; datamov ; Move ;

(HINT: don't use geek32 table, switch to http://ref.x86asm.net/coder32.html#xA1 - is has less fields with more decoding, e.g. "A1 MOV eAX moffs16/32 Move")

There are columns op1 and op2, http://ref.x86asm.net/#column_op that are for operands. First one for A1 opcode is always eAX, and second (op2) is Ov. According to table http://ref.x86asm.net/#Instruction-Operand-Codes:

O / moffs Original The instruction has no ModR/M byte; the offset of the operand is coded as a word, double word or quad word (depending on address size attribute) in the instruction. No base register, index register, or scaling factor can be applied (only MOV (A0, A1, A2, A3)).

So, after A1 opcode the memory offset is encoded. I think, there is 32-bit offset for x86 (32-bit mode).

PS: If your task is parse PE and not invent disassembler, use some x86 disassembling library like libdisasm or libudis86 or anything else.

PPS: For original question:

The question is, are these really opcodes, or are they something else?

Yes, "A1 5C F1 F2 05 B9 5C F1 F2 05 FF 50 0C F7 D8 1B C0 F7 D8 C3 CC CC CC CC CC" is x86 machine code.

like image 173
osgx Avatar answered Sep 30 '22 22:09

osgx