While profiling homegrown web-application I came across following very strange (at least for me) observation.
Almost all time is spent in socketRead0()
method of a SocketInputStream
class. It's not surprising because my application do a networking with a remote service on each request. What is strange that not only wall clock time usage is high for this method, CPU clock time also very high. I can't understand why CPU time is high, because if my application wait for remote service to reply (which is not so fast in fact), there is nothing to do left for application itself. So CPU time should be apparently low.
Some more observation:
SocketInputStream.socketRead0()
is eating up to 95% of time (both wall clock time and CPU time);mpstat
(we use Linux as a OS) shows around ~90% user time and ~1-3% system time (the rest is idle time);RestTemplate
to interact with remote service, not the SocketInputStream
directly.For now I have only one idea — maybe this is overhead of calling native methods in JVM (SocketInputStream.socketRead0()
is native)?
What do you think? Is there any other reasons to this?
The SocketInputStream. socketRead0 is the actual native blocking IO operation executed by the JDK in order to read and receive the data from the remote source. This is why Thread hanging problems on such operation is so common in the Java EE world.
I am facing the same problem. My application has a very high qps and each request will make me send multiple thrift calls, which use this native api : socketRead0
So I decide to do an experiment. I make a mock server with an api sleep 30s before return and a client calls this api. My purpose is to test the thread status when the net io happening. Based on my thread dump, the thread status is RUNNABLE
.
This explains to two things :
application with high qps blocking io will face a high cpu load value
your java thread is still running in jvm since the thread state is RUNNABLE
which will contribute to high user space cpu utilization
both of these will make your cpu busy.
I noticed during the experiment, the system space cpu utilization is low. I think this something relates to the thread scheduling strategy difference between jvm and os. We know hotspot threading model is 1:1, meaning one jvm thread to one os thread. when a blocking io happened, such as socketRead0
the kernel thread will set to state S
and will not blocking cpu, but user space thread is blocking(waiting). when this happened, I think we need to rethink the fundamental I/O model in our application.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With