Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

x86 way to tell instruction from data

Tags:

c

x86

assembly

Is there a more or less reliable way to tell whether data at some location in memory is a beginning of a processor instruction or some other data?

For example, E8 3F BD 6A 00 may be call instruction (E8) with relative offset of 0x6ABD3F, or it might be three bytes of data belonging to some other instruction, followed by push 0 (6A 00).

I know the question sounds silly and there is probably no simple way, but maybe instruction set was designed with this problem in mind and maybe some simple code examining +-100 bytes around the location can give an answer that is very likely correct.

I want to know this because I scan program's code and replace all calls to some function with calls to my replacement. It's working this far but it's not impossible that at some point, as I increase number of functions I'm replacing, some data will look exactly like a function call to that exact address, and will be replaced, and this will cause a program to break in a most unexpected fashion. I want to reduce the probability of that.

like image 867
AUTOMATIC Avatar asked Dec 29 '22 04:12

AUTOMATIC


2 Answers

If it is your code (or another one which retaining linking and debug info), the best way is to scan symbol/relocation tables in object file. Otherwise there's no reliable way to determine if some byte is inctruction or data.

Possibly the most efficient method to qualify data is recursive disassembling. I. e. disassembling code from enty point and from all jump destinations found. But this is not completely reliable, because it does not traverse jump tables (you can try to use some heuristics for this, but this is not completely reliable too).

Solution for your problem would be patch function being replaced itself: overwrite its beginning with jump inctruction to your function.

like image 183
Vovanium Avatar answered Dec 31 '22 14:12

Vovanium


Unfortunately, there is no 100% reliable way to distinguish code from data. From the CPU point of view, code is code only when some jump opcode induces the processor into trying to execute the bytes as if they were code. You could try to make a control flow analysis by beginning with the program entry point, and following all possible execution paths, but this may fail in the presence of pointers to function.

For your specific problem: I gather that you want to replace an existing function with a replacement of your own. I suggest that you patch the replaced function itself. I.e., instead of locating all calls to the foo() function and replacing them with a call to bar(), just replace the first bytes of foo() with a jump to bar() (a jmp, not a call: you do not want to mess with the stack). This is less satisfactory because of the double jump, but it is reliable.

like image 24
Thomas Pornin Avatar answered Dec 31 '22 14:12

Thomas Pornin