Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

std::this_thread::sleep_for() and nanoseconds

If I put two calls side-by-side to determine the smallest measurable time duration:

// g++ -std=c++11 -O3 -Wall test.cpp
#include <chrono>
typedef std::chrono::high_resolution_clock hrc;

hrc::time_point start = hrc::now();
hrc::time_point end   = hrc::now();
std::chrono::nanoseconds duration = end - start;
std::cout << "duration: " << duration.count() << " ns" << std::endl;

I've run this thousands of times in a loop, and I consistently get 40 ns +/- 2 ns on my particular 3.40GHz desktop.

However, when I look to see what is the shortest time I can sleep:

#include <thread>

hrc::time_point start = hrc::now();
std::this_thread::sleep_for( std::chrono::nanoseconds(1) );
hrc::time_point end   = hrc::now();
std::chrono::nanoseconds duration = end - start;
std::cout << "slept for: " << duration.count() << " ns" << std::endl;

This tells me I slept on average 55400 nanoseconds, or 55.4 microseconds. Much greater than the time I expected.

Putting the above code into a for() loop, I tried sleeping for different amounts, and this is the result:

  • sleep_for( 4000 ns ) => slept for 58000 ns
  • sleep_for( 3000 ns ) => slept for 57000 ns
  • sleep_for( 2000 ns ) => slept for 56000 ns
  • sleep_for( 1000 ns ) => slept for 55000 ns
  • sleep_for( 0 ns ) => slept for 54000 ns
  • sleep_for( -1000 ns ) => slept for 313 ns
  • sleep_for( -2000 ns ) => slept for 203 ns
  • sleep_for( -3000 ns ) => slept for 215 ns
  • sleep_for( -4000 ns ) => slept for 221 ns

Some questions I have:

  • What could explain these numbers?
  • Why does sleeping for a negative amount of time return 200+ ns, while sleeping for 0+ nanoseconds results in 50,000+ nanoseconds?
  • Is negative numbers as a sleep time a documented/supported feature, or did I accidentally stumble across some strange bug I cannot rely upon?
  • Is there a better C++ sleep call which would give me more consistent/predictable sleep times?
like image 703
Stéphane Avatar asked Aug 06 '13 04:08

Stéphane


3 Answers

What could explain these numbers?

There's a pretty obvious pattern, all your results are consistently 54000ns greater than the time you request to sleep. If you look at how GCC's this_thread::sleep_for() is implemented on GNU/Linux you'll see it just uses nanospleep and as Cubbi's comment says, calling that function can take around 50000ns. I would guess some of that cost is making a system call, so switching from user-space to the kernel and back.

Why does sleeping for a negative amount of time return 200+ ns, while sleeping for 0+ nanoseconds results in 50,000+ nanoseconds?

At a guess I'd say that the C library checks for the negative number and doesn't make a system call.

Is negative numbers as a sleep time a documented/supported feature, or did I accidentally stumble across some strange bug I cannot rely upon?

The standard doesn't forbid passing negative arguments, so it is allowed, and the function should return "immediately" because the time specified by the relative timeout has already passed. You can't rely on negative arguments returning faster than non-negative arguments though, that's an artefact of your specific implementation.

Is there a better C++ sleep call which would give me more consistent/predictable sleep times?

I don't think so - if I knew of one then we'd be using it in GCC to implement this_thread::sleep_for().

Edit: In more recent versions of GCC's libstdc++ I added:

if (__rtime <= __rtime.zero())
  return;

so there will be no system call when a zero or negative duration is requested.

like image 173
Jonathan Wakely Avatar answered Sep 28 '22 09:09

Jonathan Wakely


Inspired by Straight Fast’s answer I evaluated the effects of timer_slack_ns and of SCHED_FIFO. For timer_slack_ns you have to add

#include <sys/prctl.h> // prctl
⋮
prctl (PR_SET_TIMERSLACK, 10000U, 0, 0, 0);

meaning that for the current process the timer slack shall be set to 10µs instead of the default value of 50µs. The effect is a better responsiveness at the expense of a slightly higher energy consumption. The process can still run by a non-priviledged user. To change the scheduler policy to SCHED_FIDO you must be “root”. The code required is

#include <unistd.h>    // getpid
#include <sched.h>     // sched_setscheduler
⋮
    const pid_t pid {getpid ()};
    struct sched_param sp = {.sched_priority = 90};
    if (sched_setscheduler (pid, SCHED_FIFO, &sp) == -1) {
        perror ("sched_setscheduler");
        return 1;
    }

I ran Stéphane’s code snippets on a Desktop system with GUI (Debian 9.11, kernel 4.9.189-3+deb9u2, g++ 9.2 -O3, Intel® Core™ i5-3470T CPU @ 2.90GHz). The results for the first case (subsequent time measurements) are

Because there is no system call in between, the delay is about 260ns and is not significantly effected by the process settings. For normally distributed timings the graphs are straight lines with abscissa value for the ordinate value of 0.5 being the mean and the slope representing the standard deviation. The measured values differ from that in that there are outliers for higher delays.

In contrast to that, the second case (sleeping one nanosecond) differs between process setups because it contains system calls. Because the sleep time is so small, the sleeping does not add any time. Therefore, the graphs show the overhead only:

As stated by Stéphane, the overhead defaults to about 64µs (It’s a bit bigger here.). The time can be reduced to about 22µs by lowering the timer_slack_ns to 10µs. And by invoking the priviledged sched_setscheduler() the overhead can be cut down to about 12µs. But as the graph shows, even in this case the delay can be longer than 50µs (in 0.0001% of the runs).

The measurements show the basic dependencies of the overhead from the process settings. Other measurements have shown that the fluctuations are lower by more than a order of magnitude on non-GUI XEON server systems.

like image 39
hermannk Avatar answered Sep 28 '22 09:09

hermannk


in kernel init/init_task.c in struct task_struct init_task defined param

.timer_slack_ns = 50000, /* 50 usec default slack */

which added to non RT processes in hrtimer_nanosleep() kernel function to make timer's hardirqs less often.

like image 44
Straight Fast Avatar answered Sep 28 '22 08:09

Straight Fast