Let's suppose I have a function:
int f1(int x){
// some more or less complicated operations on x
return x;
}
And that I have another function
int f2(int x){
// we simply return x
return x;
}
I would like to be able to do something like the following:
char* _f1 = (char*)f1;
char* _f2 = (char*)f2;
int i;
for (i=0; i<FUN_LENGTH; ++i){
f1[i] = f2[i];
}
I.e. I would like to interpret f1
and f2
as raw byte arrays and "overwrite f1
byte by byte" and thus, replace it by f2
.
I know that usually callable code is write-protected, however, in my particular situation, you can simply overwrite the memory location where f1
is located. That is, I can copy the bytes over onto f1
, but afterwards, if I call f1
, the whole thing crashes.
So, is my approach possible in principle? Or are there some machine/implementation/whatsoever-dependent issues I have to take into consideration?
It would be easier to replace the first few bytes of f1
with a machine jump
instruction to the beginning of f2
. That way, you won't have to deal with any possible code relocation issues.
Also, the information about how many bytes a function occupies (FUN_LENGTH
in your question) is normally not available at runtime. Using a jump
would avoid that problem too.
For x86, the relative jump instruction opcode you need is E9
(according to here). This is a 32-bit relative jump, which means you need to calculate the relative offset between f2
and f1
. This code might do it:
int offset = (int)f2 - ((int)f1 + 5); // 5 bytes for size of instruction
char *pf1 = (char *)f1;
pf1[0] = 0xe9;
pf1[1] = offset & 0xff;
pf1[2] = (offset >> 8) & 0xff;
pf1[3] = (offset >> 16) & 0xff;
pf1[4] = (offset >> 24) & 0xff;
The offset is taken from the end of the JMP instruction, so that's why there is 5 added to the address of f1
in the offset calculation.
It's a good idea to step through the result with an assembly level debugger to make sure you're poking the correct bytes. Of course, this is all not standards compliant so if it breaks you get to keep both pieces.
Your approach is undefined behavior for the C standard.
And on many operating systems (e.g. Linux), your example will crash: the function code is inside the read only .text
segment (and section) of the ELF executable, and that segment is (sort-of) mmap-ed read-only by execve (or by dlopen
or by the dynamic linker), so you cannot write inside it.
Instead of trying to overwrite the function (which you've already found is fragile at best), I'd consider using a pointer to a function:
int complex_implementation(int x) {
// do complex stuff with x
return x;
}
int simple_implementation(int x) {
return x;
}
int (*f1)(int) = complex_implementation;
You'd use this something like:
for (int i=0; i<limit; i++) {
a = f1(a);
if (whatever_condition)
f1 = simple_implementation;
}
...and after the assignment, calling f1
would just return the input value.
Calling a function via a pointer does impose some overhead, but (thanks to that being common in OO languages) most compilers and CPUs do a pretty good job of minimizing that overhead.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With