Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

High CPU, possibly due to context switching?

One of our servers is experiencing a very high CPU load with our application. We've looked at various stats and are having issues finding the source of the problem.

One of the current theories is that there are too many threads involved and that we should try to reduce the number of concurrently executing threads. There's just one main thread pool, with 3000 threads, and a WorkManager working with it (this is Java EE - Glassfish). At any given moment, there are about 620 separate network IO operations that need to be conducted in parallel (use of java.NIO is not an option either). Moreover, there are roughly 100 operations that have no IO involved and are also executed in parallel.

This structure is not efficient and we want to see if it is actually causing damage, or is simply bad practice. Reason being that any change is quite expensive in this system (in terms of man hours) so we need some proof of an issue.

So now we're wondering if context switching of threads is the cause, given there are far more threads than the required concurrent operations. Looking at the logs, we see that on average there are 14 different threads executed in a given second. If we take into account the existence of two CPUs (see below), then it is 7 threads per CPU. This doesn't sound like too much, but we wanted to verify this.

So - can we rule out context switching or too-many-threads as the problem?

General Details:

  1. Java 1.5 (yes, it's old), running on CentOS 5, 64-bit, Linux kernel 2.6.18-128.el5
  2. There is only one single Java process on the machine, nothing else.
  3. Two CPUs, under VMware.
  4. 8GB RAM
  5. We don't have the option of running a profiler on the machine.
  6. We don't have the option of upgrading the Java, nor the OS.

UPDATE As advised below, we've conducted captures of load average (using uptime) and CPU (using vmstat 1 120) on our test server with various loads. We've waited 15 minutes between each load change and its measurements to ensure that the system stabilized around the new load and that the load average numbers are updated:

50% of the production server's workload: http://pastebin.com/GE2kGLkk

34% of the production server's workload: http://pastebin.com/V2PWq8CG

25% of the production server's workload: http://pastebin.com/0pxxK0Fu

CPU usage appears to be reduced as the load reduces, but not on a very drastic level (change from 50% to 25% is not really a 50% reduction in CPU usage). Load average seems uncorrelated with the amount of workload.

There's also a question: given our test server is also a VM, could its CPU measurements be impacted by other VMs running on the same host (making the above measurements useless)?

UPDATE 2 Attaching the snapshot of the threads in three parts (pastebin limitations)

Part 1: http://pastebin.com/DvNzkB5z

Part 2: http://pastebin.com/72sC00rc

Part 3: http://pastebin.com/YTG9hgF5

like image 673
Yon Avatar asked Mar 02 '12 14:03

Yon


4 Answers

Seems to me the problem is 100 CPU bound threads more than anything else. 3000 thread pool is basically a red herring, as idle threads don't consume much of anything. The I/O threads are likely sleeping "most" of the time, since I/O is measured on a geologic time scale in terms of computer operations.

You don't mention what the 100 CPU threads are doing, or how long they last, but if you want to slow down a computer, dedicating 100 threads of "run until time slice says stop" will most certainly do it. Because you have 100 "always ready to run", the machine will context switch as fast as the scheduler allows. There will be pretty much zero idle time. Context switching will have impact because you're doing it so often. Since the CPU threads are (likely) consuming most of the CPU time, your I/O "bound" threads are going to be waiting in the run queue longer than they're waiting for I/O. So, even more processes are waiting (the I/O processes just bail out more often as they hit an I/O barrier quickly which idles the process out for the next one).

No doubt there are tweaks here and there to improve efficiency, but 100 CPU threads are 100 CPU threads. Not much you can do there.

like image 115
Will Hartung Avatar answered Sep 21 '22 08:09

Will Hartung


I think your constraints are unreasonable. Basically what you are saying is:

1.I can't change anything
2.I can't measure anything

Can you please speculate as to what my problem might be?

The real answer to this is that you need to hook a proper profiler to the application and you need to correlate what you see with CPU usage, Disk/Network I/O, and memory.

Remember the 80/20 rule of performance tuning. 80% will come from tuning your application. You might just have too much load for one VM instance and it could be time to consider solutions for scaling horizontally or vertically by giving more resources to the machine. It could be any one of the 3 billion JVM settings are not inline with your application's execution specifics.

I assume the 3000 thread pool came from the famous more threads = more concurrency = more performance theory. The real answer is a tuning change isn't worth anything unless you measure throughput and response time before/after the change and compared the results.

like image 43
nsfyn55 Avatar answered Sep 23 '22 08:09

nsfyn55


If you can't profile, I'd recommend taking a thread dump or two and seeing what your threads are doing. Your app doesn't have to stop to do it:

  1. http://docs.oracle.com/javase/6/docs/technotes/guides/visualvm/threads.html
  2. http://java.net/projects/tda/
  3. http://java.sys-con.com/node/1611555
like image 35
duffymo Avatar answered Sep 19 '22 08:09

duffymo


So - can we rule out context switching or too-many-threads as the problem?

I think you concerns over thrashing are warranted. A thread pool with 3000 threads (700+ concurrent operations) on a 2 CPU VMware instance certainly seems like a problem that may be causing context switching overload and performance problems. Limiting the number of threads could give you a performance boost although determining the right number is going to be difficult and probably will use a lot of trial and error.

we need some proof of an issue.

I'm not sure the best way to answer but here are some ideas:

  • Watch the load average of the VM OS and the JVM. If you are seeing high load values (20+) then this is an indicator that there are too many things in the run queues.
  • Is there no way to simulate the load in a test environment so you can play with the thread pool numbers? If you run simulated load in a test environment with pool size of X and then run with X/2, you should be able to determine optimal values.
  • Can you compare high load times of day with lower load times of day? Can you graph number of responses to latency during these times to see if you can see a tipping point in terms of thrashing?
  • If you can simulate load then make sure you aren't just testing under the "drink from the fire hose" methodology. You need simulated load that you can dial up and down. Start at 10% and slowing increase simulated load while watching throughput and latency. You should be able to see the tipping points by watching for throughput flattening or otherwise deflecting.
like image 44
Gray Avatar answered Sep 21 '22 08:09

Gray