Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to detect if a thread or process is getting starved due to OS scheduling

This is on Linux OS. App is written in C++ with ACE library.

I am suspecting that one of the thread in the process is getting blocked for unusually long time(5 to 40 seconds) sometimes. The app runs fine most of the times except couple times a day it has this issue. There are other similar 5 apps running on the box which are also I/O bound due to heavy socket incoming data.

I would like to know if there is any thing I can do programatically to see if the thread/process are getting their time slice.

like image 971
Medicine Avatar asked Jun 19 '12 21:06

Medicine


1 Answers

If a process is being starved out, self monitoring for that process would not be that productive. But, if you just want that process to notice it hasn't been run in a while, it can call times periodically and compare the relative difference in elapsed time with the relative difference in scheduled user time (you would sum the tms_utime and tms_cutime fields if you want to count waiting for children as productive time, and you would sum in the tms_stime and tms_cstime fields if you count kernel time spent on your behalf to be productive time). For thread times, the only way I know of is to consult the /proc filesystem.

A high priority external process or high priority thread could externally monitor processes (and threads) of interest by reading the appropriate /proc/<pid>/stat entries for the process (and /proc/<pid>/task/<tid>/stat for the threads). The user times are found in the 14th and 16th fields of the stat file. The system times are found in the 15th and 17th fields. (The field positions are accurate for my Linux 2.6 kernel.)

Between two time points, you determine the amount of elapsed time that has passed (a monitor process or thread would usually wake up at regular intervals). Then the difference between the cumulative processing times at each of those time points represents how much time the thread of interest got to run during that time. The ratio of processing time to elapsed time would represent the time slice.

One last bit of info: On Linux, I use the following to obtain the tid of the current thread for examining the right task in the /proc/<pid>/task/ directory:

tid = syscall(__NR_gettid);

I do this, because I could not find the gettid system call actually exported by any library on my system, even though it was documented. But, it might be available on yours.

like image 85
jxh Avatar answered Nov 08 '22 15:11

jxh