For readability, I think the first code block below is better. But is the second code block faster? First Block: <pre class="prettyprint"><code>for (int i = 0; i < 5000; i++){ int number = rand() % 10000 + 1; string fizzBuzz = GetStringFromFizzBuzzLogic(number); } </code></pre> Second Block: <pre class="prettyprint"><code>int number; string fizzBuzz; for (int i = 0; i < 5000; i++){ number = rand() % 10000 + 1; fizzBuzz = GetStringFromFizzBuzzLogic(number); } </code></pre> Does redeclaring variables in C++ cost anything?

I benchmarked this particular code, and even WITHOUT optimisation, it came to almost the same runtime for both variants. And as soon as the lowest level of optimisation is turned on, the result is very close to identical (+/- a bit of noise in the time measurement). Edit: below analysis of the generated assembler code shows that it's hard to guess which form is faster, since the answer most people would probably give is <code>func2</code>, but it turns out this function is a tiny bit slower, at least when compiling with clang++ and -O2. And it's good evidence that "writ code, benchmark, change code, benchmark" is the correct way to deal with performance, not guessing based on reading the code. And remember what someone told me, optimising is a bit like taking an onion apart in layers - once you optimise one part, you end up looking at something very similar just a little smaller... ;) However, my initial analysis made <code>func1</code> significantly slower - that turns out to be becuse the compiler, for some bizarr reason, doesn't optimise the <code>rand() % 10000 + 1</code> in <code>func1</code> but does in <code>func2</code> when optimisation is turned of. This means that <code>func1</code>. However, once optimisation is enabled, both functions gets a "fast" modulo. Using the linux performance tool <code>perf</code> shows that with clang++ and -O2 we get the following for func1 <pre class="prettyprint"><code> 15.76% a.out libc-2.20.so free 12.31% a.out libstdc++.so.6.0.20 std::string::_S_construct<char cons 12.29% a.out libc-2.20.so _int_malloc 10.05% a.out a.out func1 7.26% a.out libc-2.20.so __random 6.36% a.out libc-2.20.so malloc 5.46% a.out libc-2.20.so __random_r 5.01% a.out libstdc++.so.6.0.20 std::basic_string<char, std::char_t 4.83% a.out libstdc++.so.6.0.20 std::string::_Rep::_S_create 4.01% a.out libc-2.20.so strlen </code></pre> and for func2: <pre class="prettyprint"><code> 17.88% a.out libc-2.20.so free 10.73% a.out libc-2.20.so _int_malloc 9.77% a.out libc-2.20.so malloc 9.03% a.out a.out func2 7.63% a.out libstdc++.so.6.0.20 std::string::_S_construct<char con 6.96% a.out libstdc++.so.6.0.20 std::string::_Rep::_S_create 4.48% a.out libc-2.20.so __random 4.39% a.out libc-2.20.so __random_r 4.10% a.out libc-2.20.so strlen </code></pre> There are some subtle differences, but I would call those as being more to do with the relatively short runtime of the benchmark, rather than the difference in actual code generated by the compiler. This is with the following code: <pre class="prettyprint"><code>#include <iostream> #include <string> #include <cstdlib> #define N 500000 extern std::string GetStringFromFizzBuzzLogic(int number); void func1() { for (int i = 0; i < N; i++){ int number = rand() % 10000 + 1; std::string fizzBuzz = GetStringFromFizzBuzzLogic(number); } } void func2() { int number; std::string fizzBuzz; for (int i = 0; i < N; i++){ number = rand() % 10000 + 1; fizzBuzz = GetStringFromFizzBuzzLogic(number); } } static __inline__ unsigned long long rdtsc(void) { unsigned hi, lo; __asm__ __volatile__ ("rdtsc" : "=a"(lo), "=d"(hi)); return ( (unsigned long long)lo)|( ((unsigned long long)hi)<<32 ); } int main(int argc, char **argv) { void (*f)(); if (argc == 1) f = func1; else f = func2; for(int i = 0; i < 5; i++) { unsigned long long t1 = rdtsc(); f(); t1 = rdtsc() - t1; std::cout << "time=" << t1 << std::endl; } } </code></pre> and in a separate file: <pre class="prettyprint"><code>#include <string> std::string GetStringFromFizzBuzzLogic(int number) { return "SomeString"; } </code></pre> Running with func1: <pre class="prettyprint"><code>./a.out time=876016390 time=824149942 time=826812600 time=825266315 time=826151399 </code></pre> Running with func2: <pre class="prettyprint"><code>./a.out time=905721532 time=895393507 time=886537634 time=879836476 time=883887384 </code></pre> This is with another 0 added to N - so 10 times longer runtime - it seems that it's fairly consistently a little SLOWER, but it's a few percent, and probably within the noise, really - in time, the whole benchmark takes around 1.30-1.39 seconds. Edit: Looking at the assembly code of the actual loop [this is only a portion of the loop, but the rest is identical in terms of what the code actutally does] Func1: <pre class="prettyprint"><code>.LBB0_1: # %for.body callq rand movslq %eax, %rcx imulq $1759218605, %rcx, %rcx # imm = 0x68DB8BAD movq %rcx, %rdx shrq $63, %rdx sarq $44, %rcx addl %edx, %ecx imull $10000, %ecx, %ecx # imm = 0x2710 negl %ecx leal 1(%rax,%rcx), %esi movq %r15, %rdi callq _Z26GetStringFromFizzBuzzLogici movq (%rsp), %rax leaq -24(%rax), %rdi cmpq %rbx, %rdi jne .LBB0_2 .LBB0_7: # %_ZNSsD2Ev.exit decl %ebp jne .LBB0_1 </code></pre> Func2: <pre class="prettyprint"><code>.LBB1_1: callq rand movslq %eax, %rcx imulq $1759218605, %rcx, %rcx # imm = 0x68DB8BAD movq %rcx, %rdx shrq $63, %rdx sarq $44, %rcx addl %edx, %ecx imull $10000, %ecx, %ecx # imm = 0x2710 negl %ecx leal 1(%rax,%rcx), %esi movq %rbx, %rdi callq _Z26GetStringFromFizzBuzzLogici movq %r14, %rdi movq %rbx, %rsi callq _ZNSs4swapERSs movq (%rsp), %rax leaq -24(%rax), %rdi cmpq %r12, %rdi jne .LBB1_4 .LBB1_9: # %_ZNSsD2Ev.exit19 incl %ebp cmpl $5000000, %ebp # imm = 0x4C4B40 </code></pre> So, as can be seen, the <code>func2</code> version contains an extra function call: <pre class="prettyprint"><code> callq _ZNSs4swapERSs </code></pre> which translates to <code>std::basic_string<char, std::char_traits<char>, std::allocator<char> >::swap(std::basic_string<char, std::char_traits<char>, std::allocator<char> >&)</code> or <code>std::string::swap(std::string&)</code> - which is presumably the result of calling <code>std::string::operator=(std::string &s)</code>. This would explain why <code>func2</code> is slightly slower than <code>func1</code>. I'm sure it is possible to find cases where constructing/destroying an object takes significant amounts of time in a loop, but in general, it will make little or no difference at all, and having clearer code will actually help the reader. It will also often help the compiler with "life-time analysis", since it's less code to "walk" to find out if the variable is used later (in this case, the code is short anyway, but that's obviously not always the case in real life examples)

Does redeclaring variables in C++ cost anything?

Tags:

c++

performance

declaration

For readability, I think the first code block below is better. But is the second code block faster?