Is the fastcall calling convention really faster than other calling conventions, such as cdecl? Are there any benchmarks out there that show how performance is affected by calling convention?
The __fastcall calling convention specifies that arguments to functions are to be passed in registers, when possible. This calling convention only applies to the x86 architecture.
The __cdecl keyword instructs the compiler to read and write a parameter list by using C linkage conventions. To set the __cdecl calling convention for a function, place the linkage keyword immediately before the function name or at the beginning of the declarator.
__stdcall is the standard calling convention for Win32 system calls. Wikipedia covers the details. It primarily matters when you are calling a function outside of your code (e.g. an OS API) or the OS is calling you (as is the case here with WinMain).
__thiscall It's the default calling convention used by member functions that don't use variable arguments ( vararg functions). Under __thiscall , the callee cleans the stack, which is impossible for vararg functions. Arguments are pushed on the stack from right to left.
It depends on the platform. For a Xenon PowerPC, for example, it can be an order of magnitude difference due to a load-hit-store issue with passing data on the stack. I empirically timed the overhead of a cdecl
function at about 45 cycles compared to ~4 for a fastcall
.
For an out-of-order x86 (Intel and AMD), the impact may be much less, because the registers are all shadowed and renamed anyway.
The answer really is that you need to benchmark it yourself on the particular platform you care about.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With