Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Function body on heap

A program has three sections: text, data and stack. The function body lives in the text section. Can we let a function body live on heap? Because we can manipulate memory on heap more freely, we may gain more freedom to manipulate functions.

In the following C code, I copy the text of hello function onto heap and then point a function pointer to it. The program compiles fine by gcc but gives "Segmentation fault" when running.

Could you tell me why? If my program can not be repaired, could you provide a way to let a function live on heap? Thanks!

Turing.robot

#include "stdio.h"
#include "stdlib.h"
#include "string.h"

void
hello()
{
    printf( "Hello World!\n");
}

int main(void)
{
    void (*fp)();

    int size = 10000;     //  large enough to contain hello()
    char* buffer;
    buffer = (char*) malloc ( size );
    memcpy( buffer,(char*)hello,size );
    fp = buffer;
    fp();
    free (buffer);

    return 0;
}
like image 310
user666639 Avatar asked Oct 25 '25 14:10

user666639


2 Answers

My examples below are for Linux x86_64 with gcc, but similar considerations should apply on other systems.

Can we let a function body live on heap?

Yes, absolutely we can. But usually that is called JIT (Just-in-time) compilation. See this for basic idea.

Because we can manipulate memory on heap more freely, we may gain more freedom to manipulate functions.

Exactly, that's why higher level languages like JavaScript have JIT compilers.

In the following C code, I copy the text of hello function onto heap and then point a function pointer to it. The program compiles fine by gcc but gives "Segmentation fault" when running.

Actually you have multiple "Segmentation fault"s in that code.

The first one comes from this line:

 int size = 10000;     //  large enough to contain hello()

If you see x86_64 machine code generated by gcc of your hello function, it compiles down to mere 17 bytes:

0000000000400626 <hello>:
  400626:   55                      push   %rbp
  400627:   48 89 e5                mov    %rsp,%rbp
  40062a:   bf 98 07 40 00          mov    $0x400798,%edi
  40062f:   e8 9c fe ff ff          call  4004d0 <puts@plt>
  400634:   90                      nop
  400635:   5d                      pop    %rbp
  400636:   c3                      retq   

So, when you are trying to copy 10,000 bytes, you run into a memory that does not exist and get "Segmentation fault".

Secondly, you allocate memory with malloc, which gives you a slice of memory that is protected by CPU against execution on Linux x86_64, so this would give you another "Segmentation fault".

Under the hood malloc uses system calls like brk, sbrk, and mmap to allocate memory. What you need to do is allocate executable memory using mmap system call with PROT_EXEC protection.

Thirdly, when gcc compiles your hello function, you don't really know what optimisations it will use and what the resulting machine code looks like.

For example, if you see line 4 of the compiled hello function

40062f: e8 9c fe ff ff          call  4004d0 <puts@plt>

gcc optimised it to use puts function instead of printf, but that is not even the main problem.

On x86 architectures you normally call functions using call assembly mnemonic, however, it is not a single instruction, there are actually many different machine instructions that call can compile to, see Intel manual page Vol. 2A 3-123, for reference.

In you case the compiler has chosen to use relative addressing for the call assembly instruction.

You can see that, because your call instruction has e8 opcode:

E8 - Call near, relative, displacement relative to next instruction. 32-bit displacement sign extended to 64-bits in 64-bit mode.

Which basically means that instruction pointer will jump the relative amount of bytes from the current instruction pointer.

Now, when you relocate your code with memcpy to the heap, you simply copy that relative call which will now jump the instruction pointer relative from where you copied your code to into the heap, and that memory will most likely not exist and you will get another "Segmentation fault".

If my program can not be repaired, could you provide a way to let a function live on heap? Thanks!

Below is a working code, here is what I do:

  1. Execute, printf once to make sure gcc includes it in our binary.
  2. Copy the correct size of bytes to heap, in order to not access memory that does not exist.
  3. Allocate executable memory with mmap and PROT_EXEC option.
  4. Pass printf function as argument to our heap_function to make sure that gcc uses absolute jumps for call instruction.

Here is a working code:

#include "stdio.h"
#include "string.h"
#include <stdint.h>
#include <sys/mman.h>


typedef int (*printf_t)(char* format, char* string);
typedef int (*heap_function_t)(printf_t myprintf, char* str, int a, int b);


int heap_function(printf_t myprintf, char* str, int a, int b) {
    myprintf("%s", str);
    return a + b;
}

int heap_function_end() {
    return 0;
}


int main(void) {
    // By printing something here, `gcc` will include `printf`
    // function at some address (`0x4004d0` in my case) in our binary,
    // with `printf_t` two argument signature.
    printf("%s", "Just including printf in binary\n");

    // Allocate the correct size of
    // executable `PROT_EXEC` memory.
    size_t size = (size_t) ((intptr_t) heap_function_end - (intptr_t) heap_function);
    char* buffer = (char*) mmap(0, (size_t) size,
         PROT_EXEC | PROT_READ | PROT_WRITE,
         MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
    memcpy(buffer, (char*)heap_function, size);

    // Call our function
    heap_function_t fp = (heap_function_t) buffer;
    int res = fp((void*) printf, "Hello world, from heap!\n", 1, 2);
    printf("a + b = %i\n", res);
}

Save in main.c and run with:

gcc -o main main.c && ./main
like image 94
Vad Avatar answered Oct 27 '25 02:10

Vad


In principle in concept it is doable. However... You are copying from "hello" which basically contains assembly instructions that possibly call or reference or jump to other addresses. Some of these addresses get fixed up when the application loads. Just copying that and calling into it would then crash. Also some systems like windows have data execution protection that would prevent code in data form being executed, as a security measure. Also, how large is "hello"? Trying to copy past the end of it would likely also crash. And you are also dependent on how the compiler implements "hallo". Needless to say, this would be very compiler and platform dependent, if it worked.

like image 33
Kharina Tigerfish Avatar answered Oct 27 '25 04:10

Kharina Tigerfish