Is there a more or less reliable way to tell whether data at some location in memory is a beginning of a processor instruction or some other data?
For example, E8 3F BD 6A 00
may be call
instruction (E8
) with relative offset of 0x6ABD3F
, or it might be three bytes of data belonging to some other instruction, followed by push 0
(6A 00
).
I know the question sounds silly and there is probably no simple way, but maybe instruction set was designed with this problem in mind and maybe some simple code examining +-100 bytes around the location can give an answer that is very likely correct.
I want to know this because I scan program's code and replace all calls to some function with calls to my replacement. It's working this far but it's not impossible that at some point, as I increase number of functions I'm replacing, some data will look exactly like a function call to that exact address, and will be replaced, and this will cause a program to break in a most unexpected fashion. I want to reduce the probability of that.
If it is your code (or another one which retaining linking and debug info), the best way is to scan symbol/relocation tables in object file. Otherwise there's no reliable way to determine if some byte is inctruction or data.
Possibly the most efficient method to qualify data is recursive disassembling. I. e. disassembling code from enty point and from all jump destinations found. But this is not completely reliable, because it does not traverse jump tables (you can try to use some heuristics for this, but this is not completely reliable too).
Solution for your problem would be patch function being replaced itself: overwrite its beginning with jump inctruction to your function.
Unfortunately, there is no 100% reliable way to distinguish code from data. From the CPU point of view, code is code only when some jump opcode induces the processor into trying to execute the bytes as if they were code. You could try to make a control flow analysis by beginning with the program entry point, and following all possible execution paths, but this may fail in the presence of pointers to function.
For your specific problem: I gather that you want to replace an existing function with a replacement of your own. I suggest that you patch the replaced function itself. I.e., instead of locating all calls to the foo()
function and replacing them with a call to bar()
, just replace the first bytes of foo()
with a jump to bar()
(a jmp
, not a call
: you do not want to mess with the stack). This is less satisfactory because of the double jump, but it is reliable.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With