Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is the performance of a running program getting better over time?

Tags:

c++

c++11

x86-64

Consider the following code:

#include <iostream>
#include <chrono>

using Time = std::chrono::high_resolution_clock;
using us = std::chrono::microseconds;

int main()
{
    volatile int i, k;
    const int n = 1000000;

    for(k = 0; k < 200; ++k) {
            auto begin = Time::now();
            for (i = 0; i < n; ++i);  // <--
            auto end = Time::now();
            auto dur = std::chrono::duration_cast<us>(end - begin).count();
            std::cout << dur << std::endl;
    }

    return 0;
}

I am repeatedly measuring the execution time of the inner for loop. The results are shown in the following plot (y: duration, x: repetition):

enter image description here

What is causing the decreasing of the loop execution time?

Environment: linux (kernel 4.2) @ Intel i7-2600, compiled using: g++ -std=c++11 main.cpp -O0 -o main

Edit 1

The question is not about compiler optimization or performance benchmarks. The question is, why the performance gets better over time. I am trying to understand what is happening at run-time.

Edit 2

As proposed by Vaughn Cato, I have changed the CPU frequency scaling policy to "Performance". Now I am getting the following results:

enter image description here

It confirms Vaughn Cato's conjecture. Sorry for the silly question.

like image 513
sergej Avatar asked Apr 09 '16 13:04

sergej


People also ask

Why does running get easier over time?

Like anything else, the more you do a certain activity, the more your body gets accustomed to it. Running consistently means that at some point you'll probably start to know what to expect, at least in a physical sense.

Does running improve over time?

Over time, you'll likely find that not only can you run longer, you'll feel better as you do it. Whereas you once thought you might pass out a quarter-mile before the end of a two-mile run, you might have enough left in the tank to speed up a bit at the end.

What improves running performance?

Sprint interval training In fact, a 2017 study found that six sessions of sprint interval training improved the running performance, both endurance and anaerobic, in trained runners. The intervals of work performed are at 100 percent of your effort, or all-out sprints. The rest periods are longer to help with recovery.

What factors affect running performance?

Physical factors, such as fatigue, body mass and general health, will create variances in how fast you run; however, external factors including the climate you run in and the clothes you wear will also affect overall performance.


2 Answers

What you are probably seeing is CPU frequency scaling (throttling). The CPU goes into a low-frequency state to save power when it isn't being heavily used.

Just before running your program, the CPU clock speed is probably fairly low, since there is no big load. When you run your program, the busy loop increases the load, and the CPU clock speed goes up until you hit the maximum clock speed, decreasing your times.

If you run your program several times in a row, you'll probably see the times stay at a lower value after the first run.

like image 147
Vaughn Cato Avatar answered Nov 15 '22 16:11

Vaughn Cato


In you original experiment, there are too many variables than can affect the measurements:

  • the use of your processor by other active processes (i.e. scheduling of your OS)
  • The question whether your loop is optimized away or not
  • The access and buffering to the console.
  • The initial mode of your CPU (see answer about throtling)

I must admit that I was very skeptical about your observations. I therefore wrote a small variant using a preallocated vector, to avoid I/O synchronisation effects:

volatile int i, k;  
const int n = 1000000, kmax=200,n_avg=30;
std::vector<long> v(kmax,0); 

for(k = 0; k < kmax; ++k) {
        auto begin = Time::now();
        for (i = 0; i < n; ++i);  // <-- remain thanks to volatile
        auto end = Time::now();
        auto dur = std::chrono::duration_cast<us>(end - begin).count();
        v[k]=dur;  
}

I then ran it several times on ideone (which, given the scale of its use, we can assume that in average the processor whould be in a constantly sollicitated state). Indeed your observations seemed to be confirmed.

I guess that this could be related to branch prediction, which should improve through the repetitive patterns.

I however went on, updated the code slightly and added a loop to repeat the experiment several times. Then I started to get also runs where your observation was not confirmed (i.e. at the end, the time was higher). But it may also be that the many other processes running on the ideone also influence the branch prediction in a different manner.

So in the end, to conclude anything would require a more cautious experiment, on a machine running this benchmark (and only it) a couple of hours.

like image 42
Christophe Avatar answered Nov 15 '22 17:11

Christophe