A high iowait means your CPU is waiting on requests, but you'll need to investigate further to confirm the source and effect. For example, server storage (SSD, NVMe, NFS, etc.) is almost always slower than CPU performance.
iowait is simply a form of idle time when nothing could be scheduled. The value may or may not be useful in indicating a performance problem, but it does tell the user that the system is idle and could have taken more work. Comments: A CPU can be in one of four states: user, sys, idle, or iowait.
A value less than 10ms for SSD or 100ms for hard drives is usually not cause for concern, and lower is better.
iowait is time that the processor/processors are waiting (i.e. is in an idle state and does nothing), during which there in fact was outstanding disk I/O requests. This usually means that the block devices (i.e. physical disks, not memory) is too slow, or simply saturated.
I have a concurrent system with many machines/nodes involved. Each machine run several JVMs doing different stuff. It is a "layered" architecture where each layer consists of many JVM running across the machines. Basically the top-layer JVM receives input from the outside via files, parses the input and sends it as many small records for "storage" in layer-two. Layer-two doesn't actually persist the data itself but actually persists it in layer-three (HBase and Solr) and HBase actually doesn't persist it itself either since it sends it to layer-four (HDFS) for persistence.
Most of the communication among the layers is synchronized so of course it ends up in a lot of threads waiting for lower layers to complete. But I would expect those waiting threads to be "free" wrt CPU usage.
I see a very high iowait (%wa in top) though - something like 80-90% iowait and only 10-20% sys/usr CPU usage. The system seems exhausted - slow to login via ssh and slow to respond to commands etc.
My question is if all those JVM threads waiting for lower layers to complete can cause this? Is it not supposed to be "free" waiting for responses (sockets). Does it matter with respect to this, whether the different layers uses blocking or non-blocking (NIO) io? Exactly in what situations does Linux count something as iowait (%wa in top)? When all threads in all JVMs on the machines are in a situation where it is waiting (counting because there is no other thread to run to do something meaningful in the meantime)? Or does threads waiting also count in %wa even though there are other processes ready to use the CPU for real processing?
I would really want to get a thorough explanation on how it works and how to interpret this high %wa. In the beginning I guessed that it counted as %wa when all threads where waiting, but that there where actually plenty of room for doing more, so I tried to increase the number of threads expecting to get more throughput, but that doesn't happen. So it is a real problem, not just a "visual" problem looking at top.
The output below is taken from a machine where only HBase and HDFS is running. It is on machines with HBase and/or HDFS that the problem i showing (most clearly)
--- jps ---
19498 DataNode
19690 HRegionServer
19327 SecondaryNameNode
---- typical top -------
top - 11:13:21 up 14 days, 18:20,  1 user,  load average: 4.83, 4.50, 4.25
Tasks:  99 total,   1 running,  98 sleeping,   0 stopped,   0 zombie
Cpu(s): 14.1%us,  4.3%sy,  0.0%ni,  5.4%id, 74.8%wa,  0.0%hi,  1.3%si,  0.0%st
Mem:   7133800k total,  7099632k used,    34168k free,    55540k buffers
Swap:   487416k total,      248k used,   487168k free,  2076804k cached
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM TIME+
COMMAND
19690 hbase     20   0 4629m 4.2g 9244 S   51 61.7 194:08.84 java
19498 hdfs      20   0 1030m 116m 9076 S   16  1.7  75:29.26 java
---- iostat -kd 1 ----
root@edrxen1-2:~# iostat -kd 1
Linux 2.6.32-29-server (edrxen1-2)      02/22/2012      _x86_64_        (2 CPU)
Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
xvda              3.53         3.36        15.66    4279502   19973226
dm-0            319.44      6959.14       422.37 8876213913  538720280
dm-1              0.00         0.00         0.00        912        624
xvdb            229.03      6955.81       406.71 8871957888  518747772
Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
xvda              0.00         0.00         0.00          0          0
dm-0            122.00      3852.00         0.00       3852          0
dm-1              0.00         0.00         0.00          0          0
xvdb            105.00      3252.00         0.00       3252          0
Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
xvda              0.00         0.00         0.00          0          0
dm-0             57.00      1712.00         0.00       1712          0
dm-1              0.00         0.00         0.00          0          0
xvdb             78.00      2428.00         0.00       2428          0
--- iostat -x ---
Linux 2.6.32-29-server (edrxen1-2)      02/22/2012      _x86_64_        (2 CPU)
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           8.06    0.00    3.29   65.14    0.08   23.43
Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
xvda              0.00     0.74    0.35    3.18     6.72    31.32    10.78     0.11   30.28   6.24   2.20
dm-0              0.00     0.00  213.15  106.59 13866.95   852.73    46.04     1.29   14.41   2.83  90.58
dm-1              0.00     0.00    0.00    0.00     0.00     0.00     8.00     0.00    5.78   1.12   0.00
xvdb              0.07    86.97  212.73   15.69 13860.27   821.42    64.27     2.44   25.21   3.96  90.47
--- free -o ----
             total       used       free     shared    buffers     cached
Mem:       7133800    7099452      34348          0      55612    2082364
Swap:       487416        248     487168
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With