Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Userspace Thread Latency during IO operations

I'm working on a project using an embedded Linux kernel and I encounter a problem of thread latency when accessing a flash memory.

My application is multithreaded and some threads have to complete a given task in less than 500 ms. The problem is that these threads are sometimes "frozen" during more than 1 second and my 500 ms execution time is exceded.

This behaviour seems to be linked to flash writes since it occurs also when I execute a "dd" command from shell to write continuously in the flash memory.

I tried various configurations :

  • increased the priority of my real time threads : SCHED_RR, priority=55
  • changed the IO scheduler : deadline => cfq (better: failure occurs after 15 min instead of 3 min).

By using the ftrace tool I could see that, during the "freeze" time, some threads and processes are still running, with a lot of "idle" task time between the others tasks (idle task timeslot duration is > 20ms):

  • 2 network threads (SCHED_RR, priority=50)
  • dd process

I don't understand:

  • Why all the other tasks are "locked" during all this time (sometimes when requesting a mutex, sometimes when calcultating a simple 16bits-CRC).
  • Why so much idle time can be seen with ftrace (between sched events) during this duration.
  • Why higher application thread priorities don't solve the issue.

I suspect something linked with the IO management in the kernel, as if the kernel preempted every non IO thread in order to do all the works relating to IO (network, files, ...).

Does anybody have an idea of what might cause this latency ?

My kernel settings:

  • Linux kernel version 2.6.39
  • Preempt option enabled
  • tickless
  • HZ=1000
  • CFQ scheduler (Default settings)

Edit:

As I'm not an expert, I share with you ftrace capture (to be viewed with kernelshark): https://drive.google.com/file/d/0B6pJb20-D0D2NHZBUHJVRlV0aDg/view?usp=sharing

Maybe it could help you to see what is really happening on my system.

In this capture I reproduced, with an external "dd" command, a similar behavior I encountered with my application in nominal condition.

The "hole" ("freeze") is (no more custom ftrace marker from my application) at timestamps:

  • begin: 469.118370
  • end: 469.802940

Another little "hole"

  • begin: 469.807644
  • end: 469.952975
like image 796
Patrick Duvall Avatar asked Mar 18 '15 16:03

Patrick Duvall


1 Answers

I think this can be because the kernel has decided it must flush some filesystem metadata, or do other filesystem housekeeping, and must stall your process until it has done enough.

I had similar problems and used multi-threading and a userland buffer to absorb the stalls. See my old question and answer here.

like image 75
blueshift Avatar answered Oct 03 '22 09:10

blueshift