Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scheduling events at microsecond granularity in POSIX

I'm trying to determine the granularity I can accurately schedule tasks to occur in C/C++. At the moment I can reliably schedule tasks to occur every 5 microseconds, but I'm trying to see if I can lower this further.

Any advice on how to achieve this / if it is possible would be greatly appreciated.

Since I know timer granularity can often be OS dependent: I am currently running on Linux, but would use Windows if the timing granularity is better (although I don't believe it is, based on what I've found for the QueryPerformanceCounter)

I execute all measurements on bare-metal (no VM). /proc/timer_info confirms nanosecond timer resolution for my CPU (but I know that doesn't translate to nanosecond alarm resolution)

Current

My current code can be found as a Gist here

At the moment, I'm able to execute a request every 5 microseconds (5000 nanoseconds) with less then 1% late arrivals. When late arrivals do occur, they are typically only one cycle (5000 nanoseconds) behind.

I'm doing 3 things at the moment

Setting the process to real-time priority (some pointed out by @Spudd86 here)

struct sched_param schedparm;
memset(&schedparm, 0, sizeof(schedparm));
schedparm.sched_priority = 99; // highest rt priority
sched_setscheduler(0, SCHED_FIFO, &schedparm);

Minimizing the timer slack

prctl(PR_SET_TIMERSLACK, 1);

Using timerfds (part of the 2.6 Linux kernel)

int timerfd = timerfd_create(CLOCK_MONOTONIC,0);
struct itimerspec timspec;
bzero(&timspec, sizeof(timspec));
timspec.it_interval.tv_sec = 0;
timspec.it_interval.tv_nsec = nanosecondInterval;
timspec.it_value.tv_sec = 0;
timspec.it_value.tv_nsec = 1;

timerfd_settime(timerfd, 0, &timspec, 0);

Possible improvements

  1. Dedicate a processor to this process?
  2. Use a nonblocking timerfd so that I can create a tight loop, instead of blocking (tight loop will waste more CPU, but may also be quicker to respond to an alarm)
  3. Using an external embedded device for triggering (can't imagine why this would be better)

Why

I'm currently working on creating a workload generator for a benchmarking engine. The workload generator simulates an arrival rate (X requests / second, etc.) using a Poisson process. From the Poisson process, I can determine the relative times at which requests must be made from the benchmarking engine.

So for instance, at 10 requests a second, we may have requests made at: t = 0.02, 0.04, 0.05, 0.056, 0.09 seconds

These requests need to be scheduled in advance and then executed. As the number of requests per second increases, the granularity required for scheduling these requests increases (thousands of requests per second requires sub-millisecond accuracy). As a result, I'm trying to figure out how to scale this system further.

like image 646
BSchlinker Avatar asked Nov 12 '13 10:11

BSchlinker


People also ask

What is Config_rt_group_sched?

Enabling CONFIG_RT_GROUP_SCHED lets you explicitly allocate real CPU bandwidth to task groups. This uses the cgroup virtual file system and "<cgroup>/cpu. rt_runtime_us" to control the CPU time reserved for each control group.

What is sched_ OTHER?

SCHED_OTHER is the default universal time-sharing scheduler policy used by most processes; SCHED_FIFO and SCHED_RR are intended for special time-critical applications that need precise control over the way in which runnable processes are selected for execution.

What scheduler does Linux use?

The LINUX Kernel used the O(n) scheduler between version 2.4 and 2.6. n is the number of runnable processes in the system. O(n) scheduler divides the processor's time into a unit called epochs. Each task is allowed to use at max 1 epoch.

What is a scheduling kernel?

The task scheduler, sometimes called process scheduler, is the part of the kernel that decides which task to run next. It is responsible for best using system resources to guarantee that multiple tasks are being executed simultaneously. This makes it a core component of any multitasking operating system.


2 Answers

You're very close to the limits of what vanilla Linux will offer you, and it's way past what it can guarantee. Adding the real-time patches to your kernel and tuning for full pre-emption will help give you better guarantees under load. I would also remove any dynamic memory allocation from your time critical code, malloc and friends can (and will) stall for a not-inconsequential (in a real-time sense) period of time if it has to reclaim the memory from the i/o cache. I would also be considering removing swap from that machine to help guarantee performance. Dedicating a processor to your task will help to prevent context switch times but, again, it's no guarantee.

I would also suggest that you be careful with that level of sched_priority, you're above various important bits of Linux there, which can lead to very strange effects.

like image 180
Joe Avatar answered Sep 27 '22 19:09

Joe


What you gain from building a realtime kernel is more reliable guarantees (ie lower maximum latency) of the time between an IO/timer event handled by the kernel, and control being passed to your app in response. This comes at the price of lower throughput, and you might notice an increase in your best-case latency times.

However, the only reason for using OS timers to schedule events with high-precision is if you're afraid of burning CPU cycles in a loop while you wait for your next due event. OS timers (especially in MS Windows) are not reliable for high granularity timing events, and are very dependant on the sort of timing/HPET hardware available in your system.

When I require highly accurate event scheduling, I use a hybrid method. First, I measure the worst case latency - that is, the biggest difference between the time I requested to sleep, and the actual clock time after sleeping. Let's call this difference "D". (You can actually do this on-the-fly during normal running, by tracking "D" every time you sleep, with something like "D = (D*7 + lastD) / 8" to produce a temporal average).

Then never request to sleep beyond "N - D*2", where "N" is the time of the next event. When within "D*2" time of the next event, enter a spin loop and wait for "N" to occur.

This eats a lot more CPU cycles, but depending on the accuracy you require, you might be able to get away with a "sched_yield()" in your spin loop, which is more kind to your system.

like image 34
mdw Avatar answered Sep 27 '22 19:09

mdw