In my spring boot service, I am validating incoming orders based upon order details and customer details.
In customer details, I have different lists of objects like Services, Attributes, Products, etc. and for every list, I am doing something like below:
products.stream()
.filter(Objects::nonNull)
.map(Product::getResource)
.filter(Objects::nonNull)
.filter(<SimplePredicate>)
.collect(Collectors.toList());
I am using streams like this many times for products, services & attributes. We observed that performance-wise it is giving very high TPS and memory usage is also very optimal. But this is consuming CPU very much. We are running the service in Kubernetes pods and it is taking 90% of the CPU provided.
One more interesting observation is, the more CPU we give, TPS achieved is higher and CPU usage also reaches 90%.
Is it because Streams consume more CPU? Or is it because of high Garbage Collection because after every iteration of Streams the internal memory might be garbage collected?
EDIT-1:
Upon further investigation using Load Testing, it is observed that:
The following are the statistics of TPS vs. CPU under different CPU/thread configuration.
CPU: 1500m, Threads:70
| TPS | 176 | 140 | 125 | 79 | 63 |
|----------------------------------|
| CPU | 1052 | 405 | 201 | 84 | 13 |
CPU: 1500m, Threads:35
| TPS | 500 | 510 | 500 | 530 |
|-----------------------------|
| CPU | 1172| 1349| 1310| 1214|
CPU: 2500m, Threads:70
| TPS | 20 | 20 | 25 | 28 | 26 |
|----------------------------------|
| CPU | 2063| 2429| 2303| 879 | 35 |
CPU: 2500m, Threads:35
| TPS | 1193 | 1200 | 1200 | 1230 |
|---------------------------------|
| CPU | 600 | 1908 | 2044 | 1949 |
Tomcat Configuration Used:
server.tomcat.max-connections=100
server.tomcat.max-threads=100
server.tomcat.min-spare-threads=5
EDIT-2:
The thread dump analysis says: 80% of the http-nio
threads are in Waiting on condition
state. That means all the threads are waiting for something and no one is consuming any CPU that explains low CPU usage. But what could be causing the threads going for waiting? I'm not using any Asynchronous Calls in the service also. Even I'm not using any parallel streams, only sequential streams as mentioned above.
The following is the Thread dump when CPU and TPS go down:
"http-nio-8090-exec-72" #125 daemon prio=5 os_prio=0 tid=0x00007f014001e800 nid=0x8f waiting on condition [0x00007f0158ae1000]
java.lang.Thread.State: **TIMED_WAITING** (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000000d7470b10> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
at java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)
at org.apache.tomcat.util.threads.TaskQueue.poll(TaskQueue.java:89)
at org.apache.tomcat.util.threads.TaskQueue.poll(TaskQueue.java:33)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1073)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.lang.Thread.run(Thread.java:748)
Locked ownable synchronizers:
- None
Is it because Streams consume more CPU? Or is it because of high Garbage Collection because after every iteration of Streams the internal memory might be garbage collected?
Clearly streams do consume CPU. And generally speaking, code implemented using non-parallel streams does run a bit slower than code implemented using old-fashioned loops. However, the difference in performance is not huge. (Maybe 5 or 10%?)
In general, a stream does not generate more garbage than an old-fashioned loop performing the same computation. For instance if we compared your example with a loop doing the same thing (i.e. generating a new list), then I would expect there to be a 1-to-1 correspondence between the memory allocations for the two versions.
In short, I don't think streams are directly implicated in this. Obviously, if your service is processing a lot of lists (using streams or loops) for each request, then that is going to affect the TPS. And even more so if the lists are actually fetched from your backend database. But that's normal too. This could be addressed by doing things like request caching, and tweaking the granularity of API requests to compute expensive results that the caller doesn't actually need.
(I would NOT recommend adding parallel()
to your streams in your scenario. Since your service are already compute (or swap) bound, there are no "spare" cycles to run the streams in parallel. Using parallel()
here is likely to reduce your TPS.)
The second part of your question is about performance (TPS) versus the thread count versus (we think) VCPUs. It is not possible to interpret the results you have given because you don't explain the units of measurements, and .... because I suspect that there other factors in play.
However, as a general rule:
It is also possible that there are effects that can be attributed to your cloud platform. For example, if your are running in a virtual server on a compute node with lots of virtual servers, you many not get a full CPU's worth per VCPU. And if your virtual server is generating a lot of swap traffic, that will most likely reduce your server's share of the CPU resources even further.
We cannot say what is actually causing your problem, but if I was in your shoes I would be looking at the Java GC logs, and using OS tools like vmstat
and iostat
to look for signs of excessive paging and excessive I/O in general.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With