Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is fastcall slower than stdcall?

I found following question: Is fastcall really faster?

No clear answers for x86 were given so I decided to create benchmark.

Here is the code:

#include <time.h>  int __fastcall func(int i) {        return i + 5; }  int _stdcall func2(int i) {        return i + 5; }  int _tmain(int argc, _TCHAR* argv[]) {     int iter = 100;     int x = 0;     clock_t t = clock();     for (int j = 0; j <= iter;j++)         for (int i = 0; i <= 1000000;i++)             x = func(x & 0xFF);     printf("%d\n", clock() - t);     t = clock();     for (int j = 0; j <= iter;j++)         for (int i = 0; i <= 1000000;i++)             x = func2(x & 0xFF);     printf("%d\n", clock() - t);     printf("%d", x);     return 0; } 

In case of no optimization result in MSVC 10 is:

4671 4414 

With max optimization fastcall is sometimes faster, but I guess it is multitasking noise. Here is average result (with iter = 5000)

6638 6487 

stdcall looks faster!

Here are results for GCC: http://ideone.com/hHcfP Again, fastcall lost race.

Here is part of disassembly in case of fastcall:

011917EF  pop         ecx   011917F0  mov         dword ptr [ebp-8],ecx       return i + 5; 011917F3  mov         eax,dword ptr [i]   011917F6  add         eax,5 

this is for stdcall:

    return i + 5; 0119184E  mov         eax,dword ptr [i]   01191851  add         eax,5   

i is passed via ECX, instead of stack, but saved into stack in the body! So all the effect is neglected! this simple function can be calculated using only registers! And there is no real difference between them.

Can anyone explain what is reason for fastcall? Why doesn't it give speedup?

Edit: With optimization it turned out that both functions are inlined. When I turned inlining off they both are compiled to:

00B71000  add         eax,5   00B71003  ret   

This looks like great optimization, indeed, but it doesn't respect calling conventions at all, so test is not fair.

like image 700
Andrey Avatar asked Mar 29 '11 21:03

Andrey


People also ask

What is difference between cdecl and Stdcall?

In CDECL arguments are pushed onto the stack in revers order, the caller clears the stack and result is returned via processor registry (later I will call it "register A"). In STDCALL there is one difference, the caller doeasn't clear the stack, the calle do. You are asking which one is faster.

Is Fastcall faster?

Since it typically saves at least four memory accesses, yes it is generally faster.

When should I use Fastcall?

Asking for help, clarification, or responding to other answers. Making statements based on opinion; back them up with references or personal experience.

What does __ Stdcall mean?

__stdcall means that the arguments to a function are pushed onto the stack from the first to the last. This is as opposed to __cdecl, which means that the arguments are pushed from last to first, and __fastcall, which places the first four (I think) arguments in registers, and the rest go on the stack.


2 Answers

__fastcall was introduced a long time ago. At the time, Watcom C++ was beating Microsoft for optimization, and a number of reviewers picked out its register-based calling convention as one (possible) reason why.

Microsoft responded by adding __fastcall, and they've retained it ever since -- but I don't think they ever did much more than enough to be able to say "we have a register-based calling convention too..." Their preference (especially since the 32-bit migration) seems to be for __stdcall. They've put quite a bit of work into improving their code generation with it, but (apparently) not nearly so much with __fastcall. With on-chip caching, the gain from passing things in registers isn't nearly as great as it was then anyway.

like image 132
Jerry Coffin Avatar answered Sep 21 '22 03:09

Jerry Coffin


Your micro-benchmark produces irrelevant results. __fastcall has specific uses with SSE instructions (see XNAMath) , clock() is not even remotely a suitable timer for benchmarking, and __fastcall exists for multiple platforms like Itanium and some others too, not just for x86, and in addition, your whole program can be effectively optimized to nothing except the printf statements, making the relative performance of __fastcall or __stdcall very, very irrelevant.

Finally, you've forgotten to realize the main reason that a lot of things are done the way they are- legacy. __fastcall may well have been significant before compiler inlining became as aggressive and effective as it is today, and no compiler will remove __fastcall as there will be programs that depend on it. That makes __fastcall a fact of life.

like image 34
Puppy Avatar answered Sep 20 '22 03:09

Puppy