Scheduling events at microsecond granularity in POSIX

Tags:

I'm trying to determine the granularity I can accurately schedule tasks to occur in C/C++. At the moment I can reliably schedule tasks to occur every 5 microseconds, but I'm trying to see if I can lower this further.

Any advice on how to achieve this / if it is possible would be greatly appreciated.

Since I know timer granularity can often be OS dependent: I am currently running on Linux, but would use Windows if the timing granularity is better (although I don't believe it is, based on what I've found for the QueryPerformanceCounter)

I execute all measurements on bare-metal (no VM). /proc/timer_info confirms nanosecond timer resolution for my CPU (but I know that doesn't translate to nanosecond alarm resolution)

Current

My current code can be found as a Gist here

At the moment, I'm able to execute a request every 5 microseconds (5000 nanoseconds) with less then 1% late arrivals. When late arrivals do occur, they are typically only one cycle (5000 nanoseconds) behind.

I'm doing 3 things at the moment

Setting the process to real-time priority (some pointed out by @Spudd86 here)

struct sched_param schedparm;
memset(&schedparm, 0, sizeof(schedparm));
schedparm.sched_priority = 99; // highest rt priority
sched_setscheduler(0, SCHED_FIFO, &schedparm);

Minimizing the timer slack

prctl(PR_SET_TIMERSLACK, 1);

Using timerfds (part of the 2.6 Linux kernel)

int timerfd = timerfd_create(CLOCK_MONOTONIC,0);
struct itimerspec timspec;
bzero(&timspec, sizeof(timspec));
timspec.it_interval.tv_sec = 0;
timspec.it_interval.tv_nsec = nanosecondInterval;
timspec.it_value.tv_sec = 0;
timspec.it_value.tv_nsec = 1;

timerfd_settime(timerfd, 0, &timspec, 0);

Possible improvements

Dedicate a processor to this process?
Use a nonblocking timerfd so that I can create a tight loop, instead of blocking (tight loop will waste more CPU, but may also be quicker to respond to an alarm)
Using an external embedded device for triggering (can't imagine why this would be better)

Why

I'm currently working on creating a workload generator for a benchmarking engine. The workload generator simulates an arrival rate (X requests / second, etc.) using a Poisson process. From the Poisson process, I can determine the relative times at which requests must be made from the benchmarking engine.

So for instance, at 10 requests a second, we may have requests made at: t = 0.02, 0.04, 0.05, 0.056, 0.09 seconds

These requests need to be scheduled in advance and then executed. As the number of requests per second increases, the granularity required for scheduling these requests increases (thousands of requests per second requires sub-millisecond accuracy). As a result, I'm trying to figure out how to scale this system further.

646

asked Nov 12 '13 10:11

BSchlinker

2 Answers

You're very close to the limits of what vanilla Linux will offer you, and it's way past what it can guarantee. Adding the real-time patches to your kernel and tuning for full pre-emption will help give you better guarantees under load. I would also remove any dynamic memory allocation from your time critical code, malloc and friends can (and will) stall for a not-inconsequential (in a real-time sense) period of time if it has to reclaim the memory from the i/o cache. I would also be considering removing swap from that machine to help guarantee performance. Dedicating a processor to your task will help to prevent context switch times but, again, it's no guarantee.

I would also suggest that you be careful with that level of sched_priority, you're above various important bits of Linux there, which can lead to very strange effects.

180

answered Sep 27 '22 19:09

Joe

What you gain from building a realtime kernel is more reliable guarantees (ie lower maximum latency) of the time between an IO/timer event handled by the kernel, and control being passed to your app in response. This comes at the price of lower throughput, and you might notice an increase in your best-case latency times.

However, the only reason for using OS timers to schedule events with high-precision is if you're afraid of burning CPU cycles in a loop while you wait for your next due event. OS timers (especially in MS Windows) are not reliable for high granularity timing events, and are very dependant on the sort of timing/HPET hardware available in your system.

When I require highly accurate event scheduling, I use a hybrid method. First, I measure the worst case latency - that is, the biggest difference between the time I requested to sleep, and the actual clock time after sleeping. Let's call this difference "D". (You can actually do this on-the-fly during normal running, by tracking "D" every time you sleep, with something like "D = (D*7 + lastD) / 8" to produce a temporal average).

Then never request to sleep beyond "N - D*2", where "N" is the time of the next event. When within "D*2" time of the next event, enter a spin loop and wait for "N" to occur.

This eats a lot more CPU cycles, but depending on the accuracy you require, you might be able to get away with a "sched_yield()" in your spin loop, which is more kind to your system.

answered Sep 27 '22 19:09

mdw

Related questions
                            
                                Writing binary files in python to be read by C
                            
                                What is the point behind unions in C?
                            
                                Why would the type of a bit field affect the size of the containing structure?
                            
                                Embed Java library in C
                            
                                Writing program to run on server, requesting experienced advice
                            
                                Generate header file (*.h) for on shared library file (*.so)
                            
                                Test the pointers using relational operator in for loop
                            
                                Windows shared memory segments
                            
                                Undefined reference to `PyString_FromString'
                            
                                Prototype of recv()
                            
                                Is vsnprintf_s an appropriate replacement for deprecated vsnprintf?
                            
                                How to pass a struct member as a pointer in a function?
                            
                                Is it legal to modify any data pointer through a void **
                            
                                How can I use commercial at sign in Objective-C macro?
                            
                                Alternative to backtrace() on Linux that can find symbols for static functions
                            
                                using “push” and “pop” in a stack
                            
                                ID2D1HwndRenderTarget always having black background instead of transparent
                            
                                Is it possible to link bitcode with llvm-ar archieve into a single bitcode file?
                            
                                Is it possible to wrap C enums in Rust?
                            
                                What is a LPTHREAD_START_ROUTINE?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Scheduling events at microsecond granularity in POSIX

Tags:

c

posix

timer

real-time

poisson

BSchlinker

People also ask

2 Answers

Joe

mdw

Recent Activity

Donate For Us