Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Executing code in mmap to produce executable code segfaults

I'm trying to write a function that copies a function (and ends up modify its assembly) and returns it. This works fine for one level of indirection, but at two I get a segfault.

Here is a minimum (not)working example:

#include <stdio.h>
#include <string.h>
#include <sys/mman.h>

#define BODY_SIZE 100

int f(void) { return 42; }
int (*G(void))(void) { return f; }
int (*(*H(void))(void))(void) { return G; }

int (*g(void))(void) {
    void *r = mmap(0, BODY_SIZE, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
    memcpy(r, f, BODY_SIZE);
    return r;
}

int (*(*h(void))(void))(void) {
    void *r = mmap(0, BODY_SIZE, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
    memcpy(r, g, BODY_SIZE);
    return r;
}

int main() {
    printf("%d\n", f());
    printf("%d\n", G()());
    printf("%d\n", g()());
    printf("%d\n", H()()());
    printf("%d\n", h()()()); // This one fails - why?

    return 0;
}

I can memcpy into an mmap'ed area once to create a valid function that can be called (g()()). But if I try to apply it again (h()()()) it segfaults. I have confirmed that it correctly creates the copied version of g, but when I execute that version I get a segfault.

Is there some reason why I can't execute code in one mmap'ed area from another mmap'ed area? From exploratory gdb-ing with x/i checks it seems like I can call down successfully, but when I return the function I came from has been erased and replaced with 0s.

How can I get this behaviour to work? Is it even possible?

BIG EDIT:

Many have asked for my rationale as I am obviously doing an XY problem here. That is true and intentional. You see, a little under a month ago this question was posted on the code golf stack exchange. It also got itself a nice bounty for a C/Assembly solution. I gave some idle thought to the problem and realized that by copying a functions body while stubbing out an address with some unique value I could search its memory for that value and replace it with a valid address, thus allowing me to effectively create lambda functions that take a single pointer as an argument. Using this I could get single currying working, but I need the more general currying. Thus my current partial solution is linked here. This is the full code that exhibits the segfault I am trying to avoid. While this is pretty much the definition of a bad idea, I find it entertaining and would like to know if my approach is viable or not. The only thing I'm missing is ability to run a function created from a function, but I can't get that to work.

like image 833
LambdaBeta Avatar asked Dec 07 '25 06:12

LambdaBeta


2 Answers

The code is using relative calls to invoke mmap and memcpy so the copied code ends up calling an invalid location.

You can invoke them through a pointer, e.g.:

#include <stdio.h>
#include <string.h>
#include <sys/mman.h>

#define BODY_SIZE 100

void* (*mmap_ptr)(void *addr, size_t length, int prot, int flags,
                  int fd, off_t offset) = mmap;
void* (*memcpy_ptr)(void *dest, const void *src, size_t n) = memcpy;

int f(void) { return 42; }
int (*G(void))(void) { return f; }
int (*(*H(void))(void))(void) { return G; }

int (*g(void))(void) {
    void *r = mmap_ptr(0, BODY_SIZE, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
    memcpy_ptr(r, f, BODY_SIZE);
    return r;
}

int (*(*h(void))(void))(void) {
    void *r = mmap_ptr(0, BODY_SIZE, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
    memcpy_ptr(r, g, BODY_SIZE);
    return r;
}

int main() {
    printf("%d\n", f());
    printf("%d\n", G()());
    printf("%d\n", g()());
    printf("%d\n", H()()());
    printf("%d\n", h()()()); // This one fails - why?

    return 0;
}
like image 132
Jester Avatar answered Dec 10 '25 00:12

Jester


I'm trying to write a function that copies a function

I think that is pragmatically not the right approach, unless you know very well machine code for your platform (and then you would not ask the question). Be aware of position independent code (useful because in general mmap(2) would use ASLR and give some "randomness" in the addresses). BTW, genuine self-modifying machine code (i.e. changing some bytes of some existing valid machine code) is today cache and branch-predictor unfriendly and should be avoided in practice.

I suggest two related approaches (choose one of them).

  • Generate some temporary C file (see also this), e.g. in /tmp/generated.c, then fork a compilation using gcc -Wall -g -O -fPIC /tmp/generated.c -shared -o /tmp/generated.so of it into a plugin, then dlopen(3) (for dynamic loading) that /tmp/generated.so shared object plugin (and probably use dlsym(3) to find function pointers in it...). For more about shared objects, read Drepper's How To Write Shared Libraries paper. Today, you can dlopen many hundreds of thousands of such shared libraries (see my manydl.c example) and C compilers (like recent GCC) are fast enough to compile a few thousand lines of code in a time compatible with interaction (e.g. less than a tenth of second). Generating C code is a widely used practice. In practice you would represent some AST in memory of the generated C code before emitting it.

  • Use some JIT compilation library, such as GCCJIT, or LLVM, or libjit, or asmjit, etc.... which would generate a function in memory, do the required relocations, and give you some pointer to it.

BTW, instead of coding in C, you might consider using some homoiconic language implementation (such as SBCL for Common Lisp, which compiles to machine code at every REPL interaction, or any dynamically contructed S-expr program representation).

The notions of closures and of callbacks are worthwhile to know. Read SICP and perhaps Lisp In Small Pieces (and of course the Dragon Book, for general compiler culture).

like image 42
Basile Starynkevitch Avatar answered Dec 10 '25 02:12

Basile Starynkevitch



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!