So I wanted to see by how much the performance of a program can be improved by not copying the value of a variable into another variable before comparing it (this will be explained better in the examples), and I noticed something weird. I had these two code segments:
string a = "";
for (int i = 0; i < 1000000; i++) a += 'a';
for (int i = 0; i < 1000000; i++) {
if ('b' == a.at(i));//compare the two chars directly
}
and
string a = "";
for (int i = 0; i < 100000000; i++) a += 'a';
for (int i = 0; i < 100000000; i++) {
char c = a.at(i);//declare a new variable
if ('b' == c);//compare the char with the newly created variable,
//instead of comparing it to the other char directly
}
I thought that the second segment would take longer to be executed, since there there is one more variable declared, compared to the first segment. When I actually timed the two I found that the second one took less time than the first one. I timed it a few times, and the second one always seems to take around 0.13 seconds less time to be executed. Here is the complete code:
#include <string>
#include <iostream>
#include <ctime>
using namespace std;
int main() {
clock_t timer;
string a = "";
string b;
for (int i = 0; i < 100000000; i++)
a += "a";
timer = clock();
for (int i = 0; i < 100000000; i++) {
if ('b'==a.at(i)) b += "a";
}
cout << (clock()-timer)/(float)CLOCKS_PER_SEC << "sec" << endl;
timer = clock();
for (int i = 0; i < 100000000; i++) {
char c = a.at(i);
if ('b'==c) b += "a";
}
cout << (clock()-timer)/(float)CLOCKS_PER_SEC << "sec" << endl;
return 0;
}
Why does this happen?
EDIT: I followed the suggestion of NathanOliver and I addedd separate strings for each loop, so now the code looks like this:
#include <string>
#include <iostream>
#include <ctime>
using namespace std;
int main() {
clock_t timer;
string compare_string_1 = "";
string compare_string_2 = "";
string segment_1 = "";
string segment_2 = "";
for (int i = 0; i < 100000000; i++)
compare_string_1 += "a";
for (int i = 0; i < 100000000; i++)
compare_string_2 += "a";
timer = clock();
for (int i = 0; i < 100000000; i++) {
if ('b'==compare_string_1.at(i)) segment_1 += "a";
}
cout << (clock()-timer)/(float)CLOCKS_PER_SEC << "sec" << endl;
timer = clock();
for (int i = 0; i < 100000000; i++) {
char c = compare_string_2.at(i);
if ('b'==c) segment_2 += "a";
}
cout << (clock()-timer)/(float)CLOCKS_PER_SEC << "sec" << endl;
return 0;
}
Execution time refers to the stage at which the instructions in the computer programs/code are executed. At execution time, run-time libraries are used. Some basic operations that occur at execution time include reading program instructions to carry out tasks or complete actions.
The difference between the end time and start time is the execution time. Get the execution time by subtracting the start time from the end time.
Using the DateTime module check the execution time Using the datetime module in Python and datetime. now() function to record timestamp of start and end instance and finding the difference to get the code execution time.
Using Visual C++ 2010, I got the same timing results as in the comments above - on average the second loop takes about 80% of the runtime of the first one. One or two times, the first loop was a bit faster, but this may be due to some thread hiccup in the OS. Checking the disassembly yielded the following:
First loop:
01231120 cmp dword ptr [ebp-38h],esi
01231123 jbe main+1CBh (123120Bh)
01231129 cmp dword ptr [ebp-34h],10h
0123112D mov eax,dword ptr [ebp-48h]
01231130 jae main+0F5h (1231135h)
01231132 lea eax,[ebp-48h]
01231135 cmp byte ptr [eax+esi],62h
01231139 jne main+108h (1231148h)
0123113B mov ebx,1
01231140 lea eax,[ebp-80h]
01231143 call std::basic_string<char,std::char_traits<char>,std::allocator<char> >::append (1231250h)
01231148 inc esi
01231149 cmp esi,5F5E100h
0123114F jl main+0E0h (1231120h)
Second loop:
01231155 cmp dword ptr [ebp-1Ch],esi
01231158 jbe main+1CBh (123120Bh)
0123115E cmp dword ptr [ebp-18h],10h
01231162 mov eax,dword ptr [ebp-2Ch]
01231165 jae main+12Ah (123116Ah)
01231167 lea eax,[ebp-2Ch]
0123116A cmp byte ptr [eax+esi],62h
0123116E jne main+13Dh (123117Dh)
01231170 mov ebx,1
01231175 lea eax,[ebp-64h]
01231178 call std::basic_string<char,std::char_traits<char>,std::allocator<char> >::append (1231250h)
0123117D inc esi
0123117E cmp esi,5F5E100h
01231184 jl main+115h (1231155h)
As the generated assembly looks more or less the same, I thought about throttling mechanisms within the OS or the CPU and guess what? Adding a Sleep(5000); between the two loops caused the second loop to (almost) always be slower than the first one. Running it 20 times, the second loop took around 150% of the first one's runtime on average.
EDIT: Increasing the spincount fivefold gives the same results. I assume runtimes around 0.5s are more or less reliably measurable. :-)
In the original code, I think, the OS may need a few timeslices to detect the CPU load and then starts giving the thread more priority during scheduling, also the CPU may boost after while, leaving parts of the first loop "unboosted". When the second loop starts executing, the OS/CPU may be primed for the heavy workload and execute a bit faster. The same could happen with MMU or OS-internal memory page handling. When adding the Sleep between the loops the opposite may happen, causing the OS to put the thread aside for some time until the new workload is being eventually detected, making the second loop execute a bit slower.
What are your results? Does anybody have a proper profiler like Intel's Amplifier at hand, to measure CPI-rates and CPU-speed-stepping within the loops?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With