When I compile my c source code (for example in a Linux environment) the compiler generates a file in a "machine readable" format.
Sometimes it will work, depending on the format and the libraries that you use, etc.. For example, things like allocating memory or creating a window all call the OS functions. So you have to compile for the target OS, with those libraries linked in (statically or dynamically).
However, the instructions themselves are the same. So, if your program doesn't use any of the OS functions (no standard or any other library), you could run it on another OS. The second thing that is problematic here is executable formats.. Windows .exe
is very different from for example ELF. However, a flat format that just has the instructions (such as .com
) would work on all systems.
EDIT: A fun experiment would be to compile some functions to a flat format (just the instructions) on one OS (e.g. Windows). For example:
int add(int x, int y) { return x + y; }
Save just the instructions to a file, without any relocation or other staging info. Then, on a different OS (e.g. Linux) compile a full program that will do something like this:
typedef int (*PFUNC)(int, int); // pointer to a function like our add one
PFUNC p = malloc(200); // make sure you have enough space.
FILE *f = fopen("add.com", "rb");
fread(p, 200, 1, f); // Load the file contents into p
fclose(f);
int ten = p(4, 6);
For this to work, you'd also need to tell the OS/Compiler that you want to be able to execute allocated memory, which I'm not sure how to do, but I know can be done.
I have been asked what is an ABI discrepancy. I think it's best to explain over a simple example.
Consider a little silly function:
int f(int a, int b, int (*g)(int, int))
{
return g(a * 2, b * 3) * 4;
}
Compile it for x64/Windows and for x64/Linux.
For x64/Windows the compiler emits something like:
f:
sub rsp,28h
lea edx,[rdx+rdx*2]
add ecx,ecx
call r8
shl eax,2
add rsp,28h
ret
For x64/Linux, something like:
f:
sub $0x8,%rsp
lea (%rsi,%rsi,2),%esi
add %edi,%edi
callq *%rdx
add $0x8,%rsp
shl $0x2,%eax
retq
Allowing for different traditional notations of assembly language on Windows and Linux, there obviously are substantial differences in the code.
The Windows version clearly expects a
to arrive in ECX
(lower half of the RCX
register), b
in EDX
(lower half of the RDX
register), and g
in the R8
register. This is mandated by the x64/Windows calling convention, which is a part of the ABI (application binary interface). The code also prepares arguments to g
in ECX
and EDX
.
The Linux version expects a
in EDI
(the lower half of the RDI
register), b
in ESI
(the lower half of the RSI
register), and g
in the RDX
register. This is mandated by the calling convention of System V AMD64 ABI (used on Linux and other Unix-like operating systems on x64). The code prepares arguments to g
in EDI
and ESI
.
Now imagine that we run a Windows program which somehow extracts the body of f
from a Linux-targeted module and calls it:
int g(int a, int b);
typedef int (*G)(int, int);
typedef int (*F)(int, int, G);
F f = (F) load_linux_module_and_get_symbol("module.so", "f");
int result = f(3, 4, &g);
What is going to happen? Since on Windows functions expect their arguments in ECX
, EDX
and R8
, the compiler will place actual arguments in those registers:
mov edx,4
lea r8,[g]
lea ecx,[rdx-1]
call qword ptr [f1]
But the Linux-targeted version of f
looks for values elsewhere. In particular, it is looking for the address of g
in RDX
. We have just initialized its lower half to 4, so there are practically nil chances that RDX
will contain anything making sense. The program will most likely crash.
Running Windows-targeted code on a Linux system will produce the same effect.
Thus, we cannot run 'foreign' code but with a thunk. A thunk is a piece of low-level code which rearranges arguments to allow calls between pieces of code following different sets of rules. (Thunks may probably do something else because the effects of ABI may not be limited by the calling convention.) You typically cannot write a thunk in high-level programming language.
Note that in our scenario we need to provide thunks for both f
('host-to-foreign') and g
('foreign-to-host').
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With