Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What could be delaying my select() call?

Tags:

c

linux

real-time

I have a small program running on Linux (on an embedded PC, dual-core Intel Atom 1.6GHz with Debian 6 running Linux 2.6.32-5) which communicates with external hardware via an FTDI USB-to-serial converter (using the ftdi_sio kernel module and a /dev/ttyUSB* device). Essentially, in my main loop I run

  • clock_gettime() using CLOCK_MONOTONIC
  • select() with a timeout of 8 ms
  • clock_gettime() as before
  • Output the time difference of the two clock_gettime() calls

To have some level of "soft" real-time guarantees, this thread runs as SCHED_FIFO with maximum priority (showing up as "RT" in top). It is the only thread in the system running at this priority, no other process has such priorities. My process has one other SCHED_FIFO thread with a lower priority, while everything else is at SCHED_OTHER. The two "real-time" threads are not CPU bound and do very little apart from waiting for I/O and passing on data.

The kernel I am using has no RT_PREEMPT patches (I might switch to that patch in the future). I know that if I want "proper" realtime, I need to switch to RT_PREEMPT or, better, Xenomai or the like. But nevertheless I would like to know what is behind the following timing anomalies on a "vanilla" kernel:

  • Roughly 0.03% of all select() calls are timed at over 10 ms (remember, the timeout was 8 ms).
  • The three worst cases (out of over 12 million calls) were 31.7 ms, 46.8 ms and 64.4 ms.
  • All of the above happened within 20 seconds of each other, and I think some cron job may have been interfering (although the system logs are low on information apart from the fact that cron.daily was being executed at the time).

So, my question is: What factors can be involved in such extreme cases? Is this just something that can happen inside the Linux kernel itself, i.e. would I have to switch to RT_PREEMPT, or even a non-USB interface and Xenomai, to get more reliable guarantees? Could /proc/sys/kernel/sched_rt_runtime_us be biting me? Are there any other factors I may have missed?

Another way to put this question is, what else can I do to reduce these latency anomalies without switching to a "harder" realtime environment?

Update: I have observed a new, "worse worst case" of about 118.4 ms (once over a total of around 25 million select() calls). Even when I am not using a kernel with any sort of realtime extension, I am somewhat worried by the fact that a deadline can apparently be missed by over a tenth of a second.

like image 911
mindriot Avatar asked May 20 '15 07:05

mindriot


1 Answers

Without more information it is difficult to point to something specific, so I am just guessing here:

  1. Interrupts and code that is triggered by interrupts take so much time in the kernel that your real time thread is significantly delayed. This depends on the frequency of interrupts, which interrupt handlers are involved, etc.
  2. A thread with lower priority will not be interrupted inside the kernel until it yields the cpu or leaves the kernel.
  3. As pointed out in this SO answer, CPU System Management Interrupts and Thermal Management can also cause significant time delays (up to 300ms were observed by the poster).

118ms seems quite a lot for a 1.6GHz CPU. But one driver that accidently locks the cpu for some time would be enough. If you can, try to disable some drivers or use different driver/hardware combinations.

sched_rt_period_us and sched_rt_period_us should not be a problem if they are set to reasonable values and your code behaves as you expect. Still, I would remove the limit for RT threads and see what happens.

What else can you do? Write a device driver! It's not that difficult and interrupt handlers get a higher priority than realtime threads. It may be easier to switch to a real time kernel but YMMV.

like image 149
3 revs, 2 users 90% Avatar answered Oct 18 '22 10:10

3 revs, 2 users 90%