Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Will an IO blocked process show 100% CPU utilization in 'top' output?

I have an analysis that can be parallelized over a different number of processes. It is expected that things will be both IO and CPU intensive (very high throughput short-read DNA alignment if anyone is curious.)

The system running this is a 48 core linux server.

The question is how to determine the optimum number of processes such that total throughput is maximized. At some point the processes will presumably become IO bound such that adding more processes will be of no benefit and possibly detrimental.

Can I tell from standard system monitoring tools when that point has been reached? Would the output of top (or maybe a different tool) enable me to distinguish between a IO bound and CPU bound process? I am suspicious that a process blocked on IO might still show 100% CPU utilization.

like image 952
Alex Stoddard Avatar asked Feb 26 '23 12:02

Alex Stoddard


2 Answers

When a process is blocked on IO, it isn't running, so no time is accounted against it. If there's another process that can run, then that will run instead; if there isn't, the time is counted as 'IO wait', which is accounted as a global statistic.

IO wait would be a useful thing to monitor. It shows up in top's header as something like %iw. You can monitor it in more detail with tools like iostat and vmstat. Serverfault might be a better place to ask about that.

like image 108
Tom Anderson Avatar answered Mar 05 '23 18:03

Tom Anderson


Even a single IO-bound process will rarely show high CPU utilization because the operating system has scheduled its IO and is usually just waiting for it to complete. So top cannot accurately distinguish between an IO-bound process and a non-IO-bound process that merely periodically uses the CPU. In fact, a system horribly overloaded with all IO-bound processes, barely able to accomplish anything can exhibit very low CPU utilization.

Using only top, as a first pass, you can indeed merely keep adding threads/processes until CPU utilization levels off to determine the approximate configuration for a given machine.

like image 36
Rick Sladkey Avatar answered Mar 05 '23 19:03

Rick Sladkey