Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Weird difference in execution time between two code segments

So I wanted to see by how much the performance of a program can be improved by not copying the value of a variable into another variable before comparing it (this will be explained better in the examples), and I noticed something weird. I had these two code segments:

string a = "";
for (int i = 0; i < 1000000; i++) a += 'a';

for (int i = 0; i < 1000000; i++) {
    if ('b' == a.at(i));//compare the two chars directly
}

and

string a = "";
for (int i = 0; i < 100000000; i++) a += 'a';

for (int i = 0; i < 100000000; i++) {
    char c = a.at(i);//declare a new variable
    if ('b' == c);//compare the char with the newly created variable,
                  //instead of comparing it to the other char directly
}

I thought that the second segment would take longer to be executed, since there there is one more variable declared, compared to the first segment. When I actually timed the two I found that the second one took less time than the first one. I timed it a few times, and the second one always seems to take around 0.13 seconds less time to be executed. Here is the complete code:

#include <string>
#include <iostream>
#include <ctime>

using namespace std;

int main() {
    clock_t timer;

    string a = "";
    string b;

    for (int i = 0; i < 100000000; i++)
        a += "a";

    timer = clock();

    for (int i = 0; i < 100000000; i++) {
        if ('b'==a.at(i)) b += "a";
    }

    cout << (clock()-timer)/(float)CLOCKS_PER_SEC << "sec" << endl;

    timer = clock();

    for (int i = 0; i < 100000000; i++) {
        char c = a.at(i);
        if ('b'==c) b += "a";
    }

    cout << (clock()-timer)/(float)CLOCKS_PER_SEC << "sec" << endl;

    return 0;
}

Why does this happen?

EDIT: I followed the suggestion of NathanOliver and I addedd separate strings for each loop, so now the code looks like this:

#include <string>
#include <iostream>
#include <ctime>

using namespace std;

int main() {
    clock_t timer;

    string compare_string_1 = "";
    string compare_string_2 = "";
    string segment_1 = "";
    string segment_2 = "";

    for (int i = 0; i < 100000000; i++)
        compare_string_1 += "a";

    for (int i = 0; i < 100000000; i++)
        compare_string_2 += "a";

    timer = clock();

    for (int i = 0; i < 100000000; i++) {
        if ('b'==compare_string_1.at(i)) segment_1 += "a";
    }

    cout << (clock()-timer)/(float)CLOCKS_PER_SEC << "sec" << endl;

    timer = clock();

    for (int i = 0; i < 100000000; i++) {
        char c = compare_string_2.at(i);
        if ('b'==c) segment_2 += "a";
    }

    cout << (clock()-timer)/(float)CLOCKS_PER_SEC << "sec" << endl;

    return 0;
}
like image 406
zomnombom Avatar asked Jun 07 '16 13:06

zomnombom


People also ask

What is execution time in programming?

Execution time refers to the stage at which the instructions in the computer programs/code are executed. At execution time, run-time libraries are used. Some basic operations that occur at execution time include reading program instructions to carry out tasks or complete actions.

How do you calculate Execution time?

The difference between the end time and start time is the execution time. Get the execution time by subtracting the start time from the end time.

How to check execution time of for loop in python?

Using the DateTime module check the execution time Using the datetime module in Python and datetime. now() function to record timestamp of start and end instance and finding the difference to get the code execution time.


1 Answers

Using Visual C++ 2010, I got the same timing results as in the comments above - on average the second loop takes about 80% of the runtime of the first one. One or two times, the first loop was a bit faster, but this may be due to some thread hiccup in the OS. Checking the disassembly yielded the following:

First loop:

01231120  cmp         dword ptr [ebp-38h],esi  
01231123  jbe         main+1CBh (123120Bh)  
01231129  cmp         dword ptr [ebp-34h],10h  
0123112D  mov         eax,dword ptr [ebp-48h]  
01231130  jae         main+0F5h (1231135h)  
01231132  lea         eax,[ebp-48h]  
01231135  cmp         byte ptr [eax+esi],62h  
01231139  jne         main+108h (1231148h)  
0123113B  mov         ebx,1  
01231140  lea         eax,[ebp-80h]  
01231143  call        std::basic_string<char,std::char_traits<char>,std::allocator<char> >::append (1231250h)  
01231148  inc         esi  
01231149  cmp         esi,5F5E100h  
0123114F  jl          main+0E0h (1231120h)

Second loop:

01231155  cmp         dword ptr [ebp-1Ch],esi  
01231158  jbe         main+1CBh (123120Bh)  
0123115E  cmp         dword ptr [ebp-18h],10h  
01231162  mov         eax,dword ptr [ebp-2Ch]  
01231165  jae         main+12Ah (123116Ah)  
01231167  lea         eax,[ebp-2Ch]  
0123116A  cmp         byte ptr [eax+esi],62h  
0123116E  jne         main+13Dh (123117Dh)  
01231170  mov         ebx,1  
01231175  lea         eax,[ebp-64h]  
01231178  call        std::basic_string<char,std::char_traits<char>,std::allocator<char> >::append (1231250h)  
0123117D  inc         esi  
0123117E  cmp         esi,5F5E100h  
01231184  jl          main+115h (1231155h) 

As the generated assembly looks more or less the same, I thought about throttling mechanisms within the OS or the CPU and guess what? Adding a Sleep(5000); between the two loops caused the second loop to (almost) always be slower than the first one. Running it 20 times, the second loop took around 150% of the first one's runtime on average.

EDIT: Increasing the spincount fivefold gives the same results. I assume runtimes around 0.5s are more or less reliably measurable. :-)

In the original code, I think, the OS may need a few timeslices to detect the CPU load and then starts giving the thread more priority during scheduling, also the CPU may boost after while, leaving parts of the first loop "unboosted". When the second loop starts executing, the OS/CPU may be primed for the heavy workload and execute a bit faster. The same could happen with MMU or OS-internal memory page handling. When adding the Sleep between the loops the opposite may happen, causing the OS to put the thread aside for some time until the new workload is being eventually detected, making the second loop execute a bit slower.

What are your results? Does anybody have a proper profiler like Intel's Amplifier at hand, to measure CPI-rates and CPU-speed-stepping within the loops?

like image 199
Kento Asashima Avatar answered Sep 28 '22 12:09

Kento Asashima