In C++, i'm trying to write a wrapper around a 64 bits integer. My expectation is that if written correctly and all methods are inlined such a wrapper should be as performant as the real type. Answer to this question on SO seems to agree with my expectation.
I wrote this code to test my expectation :
class B
{
private:
uint64_t _v;
public:
inline B() {};
inline B(uint64_t v) : _v(v) {};
inline B& operator=(B rhs) { _v = rhs._v; return *this; };
inline B& operator+=(B rhs) { _v += rhs._v; return *this; };
inline operator uint64_t() const { return _v; };
};
int main(int argc, char* argv[])
{
typedef uint64_t;
//typedef B T;
const unsigned int x = 100000000;
Utils::CTimer timer;
timer.start();
T sum = 0;
for (unsigned int i = 0; i < 100; ++i)
{
for (uint64_t f = 0; f < x; ++f)
{
sum += f;
}
}
float time = timer.GetSeconds();
cout << sum << endl
<< time << " seconds" << endl;
return 0;
}
When I run this with typedef B T
; instead of typedef uint64_t T
the reported times are consistently 10% slower when compiled with VC++. With g++ the performances are same if I use the wrapper or not.
Since g++ does it I guess there is no technical reason why VC++ can not optimise this correctly. Is there something I could do to make it optimize it?
I already tried to play with the optimisations flag with no success
For the record, this is what g++ and clang++'s generated assembly at -O2
translates to (in both wrapper and non-wrapper cases), modulo the timing part:
sum = 499999995000000000;
cout << sum << endl;
In other words, it optimized the loop out entirely. Regardless of how hard you try to vectorize the loop, it's rather hard to beat not looping at all :)
Using /O2
(maximize speed), both alternatives generate exactly the same assembly using Visual Studio 2012. This is your code, minus the timing and output:
00FB1000 push ebp
00FB1001 mov ebp,esp
00FB1003 and esp,0FFFFFFF8h
00FB1006 sub esp,8
00FB1009 mov edx,64h
00FB100E mov edi,edi
00FB1010 xorps xmm0,xmm0
00FB1013 movlpd qword ptr [esp],xmm0
00FB1018 mov ecx,dword ptr [esp+4]
00FB101C mov eax,dword ptr [esp]
00FB101F nop
00FB1020 add eax,1
00FB1023 adc ecx,0
00FB1026 jne main+2Fh (0FB102Fh)
00FB1028 cmp eax,5F5E100h
00FB102D jb main+20h (0FB1020h)
00FB102F dec edx
00FB1030 jne main+10h (0FB1010h)
00FB1032 xor eax,eax
I'd presume that the measured times fluctuate or are not always correct.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With