If I put two calls side-by-side to determine the smallest measurable time duration:
// g++ -std=c++11 -O3 -Wall test.cpp
#include <chrono>
typedef std::chrono::high_resolution_clock hrc;
hrc::time_point start = hrc::now();
hrc::time_point end = hrc::now();
std::chrono::nanoseconds duration = end - start;
std::cout << "duration: " << duration.count() << " ns" << std::endl;
I've run this thousands of times in a loop, and I consistently get 40 ns +/- 2 ns on my particular 3.40GHz desktop.
However, when I look to see what is the shortest time I can sleep:
#include <thread>
hrc::time_point start = hrc::now();
std::this_thread::sleep_for( std::chrono::nanoseconds(1) );
hrc::time_point end = hrc::now();
std::chrono::nanoseconds duration = end - start;
std::cout << "slept for: " << duration.count() << " ns" << std::endl;
This tells me I slept on average 55400 nanoseconds, or 55.4 microseconds. Much greater than the time I expected.
Putting the above code into a for()
loop, I tried sleeping for different amounts, and this is the result:
Some questions I have:
What could explain these numbers?
There's a pretty obvious pattern, all your results are consistently 54000ns greater than the time you request to sleep. If you look at how GCC's this_thread::sleep_for()
is implemented on GNU/Linux you'll see it just uses nanospleep
and as Cubbi's comment says, calling that function can take around 50000ns. I would guess some of that cost is making a system call, so switching from user-space to the kernel and back.
Why does sleeping for a negative amount of time return 200+ ns, while sleeping for 0+ nanoseconds results in 50,000+ nanoseconds?
At a guess I'd say that the C library checks for the negative number and doesn't make a system call.
Is negative numbers as a sleep time a documented/supported feature, or did I accidentally stumble across some strange bug I cannot rely upon?
The standard doesn't forbid passing negative arguments, so it is allowed, and the function should return "immediately" because the time specified by the relative timeout has already passed. You can't rely on negative arguments returning faster than non-negative arguments though, that's an artefact of your specific implementation.
Is there a better C++ sleep call which would give me more consistent/predictable sleep times?
I don't think so - if I knew of one then we'd be using it in GCC to implement this_thread::sleep_for()
.
Edit: In more recent versions of GCC's libstdc++ I added:
if (__rtime <= __rtime.zero())
return;
so there will be no system call when a zero or negative duration is requested.
Inspired by Straight Fast’s answer I evaluated the effects of timer_slack_ns
and of SCHED_FIFO
. For timer_slack_ns
you have to add
#include <sys/prctl.h> // prctl
⋮
prctl (PR_SET_TIMERSLACK, 10000U, 0, 0, 0);
meaning that for the current process the timer slack shall be set to 10µs instead of the default value of 50µs. The effect is a better responsiveness at the expense of a slightly higher energy consumption. The process can still run by a non-priviledged user. To change the scheduler policy to SCHED_FIDO
you must be “root”. The code required is
#include <unistd.h> // getpid
#include <sched.h> // sched_setscheduler
⋮
const pid_t pid {getpid ()};
struct sched_param sp = {.sched_priority = 90};
if (sched_setscheduler (pid, SCHED_FIFO, &sp) == -1) {
perror ("sched_setscheduler");
return 1;
}
I ran Stéphane’s code snippets on a Desktop system with GUI (Debian 9.11, kernel 4.9.189-3+deb9u2, g++ 9.2 -O3, Intel® Core™ i5-3470T CPU @ 2.90GHz). The results for the first case (subsequent time measurements) are
Because there is no system call in between, the delay is about 260ns and is not significantly effected by the process settings. For normally distributed timings the graphs are straight lines with abscissa value for the ordinate value of 0.5 being the mean and the slope representing the standard deviation. The measured values differ from that in that there are outliers for higher delays.
In contrast to that, the second case (sleeping one nanosecond) differs between process setups because it contains system calls. Because the sleep time is so small, the sleeping does not add any time. Therefore, the graphs show the overhead only:
As stated by Stéphane, the overhead defaults to about 64µs (It’s a bit bigger here.). The time can be reduced to about 22µs by lowering the timer_slack_ns
to 10µs. And by invoking the priviledged sched_setscheduler()
the overhead can be cut down to about 12µs. But as the graph shows, even in this case the delay can be longer than 50µs (in 0.0001% of the runs).
The measurements show the basic dependencies of the overhead from the process settings. Other measurements have shown that the fluctuations are lower by more than a order of magnitude on non-GUI XEON server systems.
in kernel init/init_task.c in struct task_struct init_task defined param
.timer_slack_ns = 50000, /* 50 usec default slack */
which added to non RT processes in hrtimer_nanosleep() kernel function to make timer's hardirqs less often.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With