Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

I'm writing my own JIT-interpreter. How do I execute generated instructions?

Tags:

c

x86

assembly

I intend to write my own JIT-interpreter as part of a course on VMs. I have a lot of knowledge about high-level languages, compilers and interpreters, but little or no knowledge about x86 assembly (or C for that matter).

Actually I don't know how a JIT works, but here is my take on it: Read in the program in some intermediate language. Compile that to x86 instructions. Ensure that last instruction returns to somewhere sane back in the VM code. Store the instructions some where in memory. Do an unconditional jump to the first instruction. Voila!

So, with that in mind, I have the following small C program:

#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>

int main() {
    int *m = malloc(sizeof(int));
    *m = 0x90; // NOP instruction code

    asm("jmp *%0"
               : /* outputs:  */ /* none */
               : /* inputs:   */ "d" (m)
               : /* clobbers: */ "eax");

    return 42;

}

Okay, so my intention is for this program to store the NOP instruction somewhere in memory, jump to that location and then probably crash (because I haven't setup any way for the program to return back to main).

Question: Am I on the right path?

Question: Could you show me a modified program that manages to find its way back to somewhere inside main?

Question: Other issues I should beware of?

PS: My goal is to gain understanding, not necessarily do everything the right way.


Thanks for all the feedback. The following code seems to be the place to start and works on my Linux box:

#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include <sys/mman.h>

unsigned char *m;

int main() {
        unsigned int pagesize = getpagesize();
        printf("pagesize: %u\n", pagesize);

        m = malloc(1023+pagesize+1);
        if(m==NULL) return(1);

        printf("%p\n", m);
        m = (unsigned char *)(((long)m + pagesize-1) & ~(pagesize-1));
        printf("%p\n", m);

        if(mprotect(m, 1024, PROT_READ|PROT_EXEC|PROT_WRITE)) {
                printf("mprotect fail...\n");
                return 0;
        }

        m[0] = 0xc9; //leave
        m[1] = 0xc3; //ret
        m[2] = 0x90; //nop

        printf("%p\n", m);


asm("jmp *%0"
                   : /* outputs:  */ /* none */
                   : /* inputs:   */ "d" (m)
                   : /* clobbers: */ "ebx");

        return 21;
}
like image 513
Magnus Madsen Avatar asked Jan 26 '12 08:01

Magnus Madsen


2 Answers

Question: Am I on the right path?

I would say yes.

Question: Could you show me a modified program that manages to find its way back to somewhere inside main?

I haven't got any code for you, but a better way to get to the generated code and back is to use a pair of call/ret instructions, as they will manage the return address automatically.

Question: Other issues I should beware of?

Yes - as a security measure, many operating systems would prevent you from executing code on the heap without making special arrangements. Those special arrangements typically amount to you having to mark the relevant memory page(s) as executable.

On Linux this is done using mprotect() with PROT_EXEC.

like image 113
NPE Avatar answered Nov 06 '22 08:11

NPE


If your generated code follows the proper calling convention, then you can declare a pointer-to-function type and invoke the function this way:

typedef void (*generated_function)(void);

void *func = malloc(1024);
unsigned char *o = (unsigned char *)func;
generated_function *func_exec = (generated_function *)func;

*o++ = 0x90;     // NOP
*o++ = 0xcb;     // RET

func_exec();
like image 37
Simon Richter Avatar answered Nov 06 '22 07:11

Simon Richter