For readability, I think the first code block below is better. But is the second code block faster?
First Block:
for (int i = 0; i < 5000; i++){
int number = rand() % 10000 + 1;
string fizzBuzz = GetStringFromFizzBuzzLogic(number);
}
Second Block:
int number;
string fizzBuzz;
for (int i = 0; i < 5000; i++){
number = rand() % 10000 + 1;
fizzBuzz = GetStringFromFizzBuzzLogic(number);
}
Does redeclaring variables in C++ cost anything?
C allows a global variable to be declared again when first declaration doesn't initialize the variable.
Before you can use a variable in C, you must declare it. Variable declarations show up in three places: Outside a function. These declarations declare global variables that are visible throughout the program (i.e. they have global scope).
Modern C compilers such as gcc and clang support the C99 and C11 standards, which allow you to declare a variable anywhere a statement could go. The variable's scope starts from the point of the declaration to the end of the block (next closing brace). You can also declare variables inside for loop initializers.
It's not a problem to define a variable within a loop. In fact, it's good practice, since identifiers should be confined to the smallest possible scope.
Any modern compiler will notice this and do the optimization work. When in doubt, always go for the readability. Declare variables in as inner-most scope as you can.
I benchmarked this particular code, and even WITHOUT optimisation, it came to almost the same runtime for both variants. And as soon as the lowest level of optimisation is turned on, the result is very close to identical (+/- a bit of noise in the time measurement).
Edit: below analysis of the generated assembler code shows that it's hard to guess which form is faster, since the answer most people would probably give is func2
, but it turns out this function is a tiny bit slower, at least when compiling with clang++ and -O2. And it's good evidence that "writ code, benchmark, change code, benchmark" is the correct way to deal with performance, not guessing based on reading the code. And remember what someone told me, optimising is a bit like taking an onion apart in layers - once you optimise one part, you end up looking at something very similar just a little smaller... ;)
However, my initial analysis made func1
significantly slower - that turns out to be becuse the compiler, for some bizarr reason, doesn't optimise the rand() % 10000 + 1
in func1
but does in func2
when optimisation is turned of. This means that func1
. However, once optimisation is enabled, both functions gets a "fast" modulo.
Using the linux performance tool perf
shows that with clang++ and -O2 we get the following for func1
15.76% a.out libc-2.20.so free
12.31% a.out libstdc++.so.6.0.20 std::string::_S_construct<char cons
12.29% a.out libc-2.20.so _int_malloc
10.05% a.out a.out func1
7.26% a.out libc-2.20.so __random
6.36% a.out libc-2.20.so malloc
5.46% a.out libc-2.20.so __random_r
5.01% a.out libstdc++.so.6.0.20 std::basic_string<char, std::char_t
4.83% a.out libstdc++.so.6.0.20 std::string::_Rep::_S_create
4.01% a.out libc-2.20.so strlen
and for func2:
17.88% a.out libc-2.20.so free
10.73% a.out libc-2.20.so _int_malloc
9.77% a.out libc-2.20.so malloc
9.03% a.out a.out func2
7.63% a.out libstdc++.so.6.0.20 std::string::_S_construct<char con
6.96% a.out libstdc++.so.6.0.20 std::string::_Rep::_S_create
4.48% a.out libc-2.20.so __random
4.39% a.out libc-2.20.so __random_r
4.10% a.out libc-2.20.so strlen
There are some subtle differences, but I would call those as being more to do with the relatively short runtime of the benchmark, rather than the difference in actual code generated by the compiler.
This is with the following code:
#include <iostream>
#include <string>
#include <cstdlib>
#define N 500000
extern std::string GetStringFromFizzBuzzLogic(int number);
void func1()
{
for (int i = 0; i < N; i++){
int number = rand() % 10000 + 1;
std::string fizzBuzz = GetStringFromFizzBuzzLogic(number);
}
}
void func2()
{
int number;
std::string fizzBuzz;
for (int i = 0; i < N; i++){
number = rand() % 10000 + 1;
fizzBuzz = GetStringFromFizzBuzzLogic(number);
}
}
static __inline__ unsigned long long rdtsc(void)
{
unsigned hi, lo;
__asm__ __volatile__ ("rdtsc" : "=a"(lo), "=d"(hi));
return ( (unsigned long long)lo)|( ((unsigned long long)hi)<<32 );
}
int main(int argc, char **argv)
{
void (*f)();
if (argc == 1)
f = func1;
else
f = func2;
for(int i = 0; i < 5; i++)
{
unsigned long long t1 = rdtsc();
f();
t1 = rdtsc() - t1;
std::cout << "time=" << t1 << std::endl;
}
}
and in a separate file:
#include <string>
std::string GetStringFromFizzBuzzLogic(int number)
{
return "SomeString";
}
Running with func1:
./a.out
time=876016390
time=824149942
time=826812600
time=825266315
time=826151399
Running with func2:
./a.out
time=905721532
time=895393507
time=886537634
time=879836476
time=883887384
This is with another 0 added to N - so 10 times longer runtime - it seems that it's fairly consistently a little SLOWER, but it's a few percent, and probably within the noise, really - in time, the whole benchmark takes around 1.30-1.39 seconds.
Edit: Looking at the assembly code of the actual loop [this is only a portion of the loop, but the rest is identical in terms of what the code actutally does]
Func1:
.LBB0_1: # %for.body
callq rand
movslq %eax, %rcx
imulq $1759218605, %rcx, %rcx # imm = 0x68DB8BAD
movq %rcx, %rdx
shrq $63, %rdx
sarq $44, %rcx
addl %edx, %ecx
imull $10000, %ecx, %ecx # imm = 0x2710
negl %ecx
leal 1(%rax,%rcx), %esi
movq %r15, %rdi
callq _Z26GetStringFromFizzBuzzLogici
movq (%rsp), %rax
leaq -24(%rax), %rdi
cmpq %rbx, %rdi
jne .LBB0_2
.LBB0_7: # %_ZNSsD2Ev.exit
decl %ebp
jne .LBB0_1
Func2:
.LBB1_1:
callq rand
movslq %eax, %rcx
imulq $1759218605, %rcx, %rcx # imm = 0x68DB8BAD
movq %rcx, %rdx
shrq $63, %rdx
sarq $44, %rcx
addl %edx, %ecx
imull $10000, %ecx, %ecx # imm = 0x2710
negl %ecx
leal 1(%rax,%rcx), %esi
movq %rbx, %rdi
callq _Z26GetStringFromFizzBuzzLogici
movq %r14, %rdi
movq %rbx, %rsi
callq _ZNSs4swapERSs
movq (%rsp), %rax
leaq -24(%rax), %rdi
cmpq %r12, %rdi
jne .LBB1_4
.LBB1_9: # %_ZNSsD2Ev.exit19
incl %ebp
cmpl $5000000, %ebp # imm = 0x4C4B40
So, as can be seen, the func2
version contains an extra function call:
callq _ZNSs4swapERSs
which translates to std::basic_string<char, std::char_traits<char>, std::allocator<char> >::swap(std::basic_string<char, std::char_traits<char>, std::allocator<char> >&)
or std::string::swap(std::string&)
- which is presumably the result of calling std::string::operator=(std::string &s)
. This would explain why func2
is slightly slower than func1
.
I'm sure it is possible to find cases where constructing/destroying an object takes significant amounts of time in a loop, but in general, it will make little or no difference at all, and having clearer code will actually help the reader. It will also often help the compiler with "life-time analysis", since it's less code to "walk" to find out if the variable is used later (in this case, the code is short anyway, but that's obviously not always the case in real life examples)
The 1st code block should be considered faster, since you don't have any overhead for calling the std::string
default constructor once.
Actually you don't have a redeclaration of the variables in your 2nd code block. These are just plain assignment operations.
A redeclaration would actually mean you have something like this
int number;
string fizzBuzz;
for (int i = 0; i < 5000; i++){
int number = rand() % 10000 + 1;
// ^^^
string fizzBuzz = GetStringFromFizzBuzzLogic(number);
// ^^^^^^
}
In this case the overhead would be optimized out by the compiler, since the outer scope variables aren't used at all.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With