How to write self-modifying code in x86 assembly

Tags:

I'm looking at writing a JIT compiler for a hobby virtual machine I've been working on recently. I know a bit of assembly, (I'm mainly a C programmer. I can read most assembly with reference for opcodes I don't understand, and write some simple programs.) but I'm having a hard time understanding the few examples of self-modifying code I've found online.

This is one such example: http://asm.sourceforge.net/articles/smc.html

The example program provided does about four different modifications when run, none of which are clearly explained. Linux kernel interrupts are used several times, and aren't explained or detailed. (The author moved data into several registers before calling the interrupts. I assume he was passing arguments, but these arguments aren't explained at all, leaving the reader to guess.)

What I'm looking for is the simplest, most straightforward example in code of a self-modifying program. Something that I can look at, and use to understand how self-modifying code in x86 assembly has to be written, and how it works. Are there any resources you can point me to, or any examples you can give that would adequately demonstrate this?

I'm using NASM as my assembler.

EDIT: I'm also running this code on Linux.

870

asked Jan 27 '11 04:01

jakogut

1 Answers

wow, this turned out to be a lot more painful than I expected. 100% of the pain was linux protecting the program from being overwritten and/or executing data.

Two solutions shown below. And a lot of googling was involved so the somewhat simple put some instruction bytes and execute them was mine, the mprotect and aligning on page size was culled from google searches, stuff I had to learn for this example.

The self modifying code is straight forward, if you take the program or at least just the two simple functions, compile and then disassemble you will get the opcodes for those instructions. or use nasm to compile blocks of assembler, etc. From this I determined the opcode to load an immediate into eax then return.

Ideally you simply put those bytes in some ram and execute that ram. To get linux to do that you have to change the protection, which means you have to send it a pointer that is aligned on a mmap page. So allocate more than you need, find the aligned address within that allocation that is on a page boundary and mprotect from that address and use that memory to put your opcodes and then execute.

the second example takes an existing function compiled into the program, again because of the protection mechanism you cannot simply point at it and change bytes, you have to unprotect it from writes. So you have to back up to the prior page boundary call mprotect with that address and enough bytes to cover the code to be modified. Then you can change the bytes/opcodes for that function in any way you want (so long as you don't spill over into any function you want to continue to use) and execute it. In this case you can see that fun() works, then I change it to simply return a value, call it again and now it has been modified.

#include <stdio.h> #include <stdlib.h> #include <string.h> #include <sys/mman.h>  unsigned char * testfun;  unsigned int fun(unsigned int a) {     return (a + 13); }  unsigned int fun2(void) {     return (13); }  int main(void) {     unsigned int ra;     unsigned int pagesize;     unsigned char * ptr;     unsigned int offset;      pagesize = getpagesize();     testfun = malloc(1023 + pagesize + 1);     if (testfun == NULL) return (1);     //need to align the address on a page boundary     printf("%p\n", testfun);     testfun = (unsigned char * )(((long) testfun + pagesize - 1) & ~(pagesize - 1));     printf("%p\n", testfun);      if (mprotect(testfun, 1024, PROT_READ | PROT_EXEC | PROT_WRITE)) {         printf("mprotect failed\n");         return (1);     }      //400687: b8 0d 00 00 00          mov    $0xd,%eax     //40068d: c3                      retq      testfun[0] = 0xb8;     testfun[1] = 0x0d;     testfun[2] = 0x00;     testfun[3] = 0x00;     testfun[4] = 0x00;     testfun[5] = 0xc3;      ra = ((unsigned int( * )()) testfun)();     printf("0x%02X\n", ra);      testfun[0] = 0xb8;     testfun[1] = 0x20;     testfun[2] = 0x00;     testfun[3] = 0x00;     testfun[4] = 0x00;     testfun[5] = 0xc3;      ra = ((unsigned int( * )()) testfun)();     printf("0x%02X\n", ra);      printf("%p\n", fun);     offset = (unsigned int)(((long) fun) & (pagesize - 1));     ptr = (unsigned char * )((long) fun & (~(pagesize - 1)));      printf("%p 0x%X\n", ptr, offset);      if (mprotect(ptr, pagesize, PROT_READ | PROT_EXEC | PROT_WRITE)) {         printf("mprotect failed\n");         return (1);     }      //for(ra=0;ra&lt;20;ra++) printf("0x%02X,",ptr[offset+ra]); printf("\n");      ra = 4;     ra = fun(ra);     printf("0x%02X\n", ra);      ptr[offset + 0] = 0xb8;     ptr[offset + 1] = 0x22;     ptr[offset + 2] = 0x00;     ptr[offset + 3] = 0x00;     ptr[offset + 4] = 0x00;     ptr[offset + 5] = 0xc3;      ra = 4;     ra = fun(ra);     printf("0x%02X\n", ra);      return (0); }

165

answered Sep 30 '22 19:09

old_timer

Related questions
                            
                                Using Assembly Language in C/C++
                            
                                Is it possible to include inline assembly in Go code?
                            
                                What is the difference between unconditional branch and unconditional jump (instructions in MIPS)?
                            
                                Is there syntax highlighting for assembly in Sublime Text 2?
                            
                                Where are expressions and constants stored if not in memory?
                            
                                NASM Vs GAS (Practical differences)
                            
                                Stack allocation, padding, and alignment
                            
                                Algorithm for finding the smallest power of two that's greater or equal to a given value [duplicate]
                            
                                What does `rep ret` mean?
                            
                                What registers are preserved through a linux x86-64 function call
                            
                                Does it make any sense to use the LFENCE instruction on x86/x86_64 processors?
                            
                                Where to learn x64 assembly from? [closed]
                            
                                What does MOV EAX, DWORD PTR DS:[ESI] mean and what does it do?
                            
                                How to Detect the Number of Physical Processors / Cores on Windows, Mac and Linux
                            
                                "enter" vs "push ebp; mov ebp, esp; sub esp, imm" and "leave" vs "mov esp, ebp; pop ebp"
                            
                                What does the "rep stos" x86 assembly instruction sequence do?
                            
                                Why is gcc allowed to speculatively load from a struct?
                            
                                What do C and Assembler actually compile to? [closed]
                            
                                What is stack frame in assembly?
                            
                                What does ORG Assembly Instruction do?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to write self-modifying code in x86 assembly

Tags:

x86

assembly

jit

vm-implementation

self-modifying

jakogut

People also ask

1 Answers

old_timer

Recent Activity

Donate For Us