I have code which calls a lot of
int myNumber = (int)(floatNumber);
which takes up, in total, around 10% of my CPU time (according to profiler). While I could leave it at that, I wonder if there are faster options, so I tried searching around, and stumbled upon
http://devmaster.net/forums/topic/7804-fast-int-float-conversion-routines/ http://stereopsis.com/FPU.html
I tried implementing the Real2Int() function given there, but it gives me wrong results, and runs slower. Now I wonder, are there faster implementations to floor double / float values to integers, or is the SSE2 version as fast as it gets? The pages I found date back a bit, so it might just be outdated, and newer STL is faster at this.
The current implementation does:
013B1030 call _ftol2_sse (13B19A0h)
013B19A0 cmp dword ptr [___sse2_available (13B3378h)],0
013B19A7 je _ftol2 (13B19D6h)
013B19A9 push ebp
013B19AA mov ebp,esp
013B19AC sub esp,8
013B19AF and esp,0FFFFFFF8h
013B19B2 fstp qword ptr [esp]
013B19B5 cvttsd2si eax,mmword ptr [esp]
013B19BA leave
013B19BB ret
Related questions I found:
Fast float to int conversion and floating point precision on ARM (iPhone 3GS/4)
What is the fastest way to convert float to int on x86
Since both are old, or are ARM based, I wonder if there are current ways to do this. Note that it says the best conversion is one that doesn't happen, but I need to have it, so that will not be possible.
It's going to be hard to beat that if you are targeting generic x86 hardware. The runtime doesn't know for sure that the target machine has an SSE unit. If it did, it could do what the x64 compiler does and inline a cvttss2si
opcode. But since the runtime has to check whether an SSE unit is available, you are left with the current implementation. That's what the implementation of ftol2_sse
does. And what's more it passes the value in an x87 register and then transfers it to an SSE register if an SSE unit is available.
You could tell the x86 compiler to target machines that have SSE units. Then the compiler would indeed emit a simple cvttss2si
opcode inline. That's going to be as fast as you can get. But if you run the code on an older machine then it will fail. Perhaps you could supply two versions, one for machines with SSE, and one for those without.
That's not going to gain you all that much. It's just going to avoid all the overhead of ftol2_sse
that happens before you actually reach the cvttss2si
opcode that does the work.
To change the compiler settings from the IDE, use Project > Properties > Configuration Properties > C/C++ > Code Generation > Enable Enhanced Instruction Set. On the command line it is /arch:SSE or /arch:SSE2.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With