Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

debugging JBoss 100% CPU usage

Originally posted on Server Fault, where it was suggested this question might better asked here.

We are using JBoss to run two of our WARs. One is our web app, the other is our web service. The web app accesses a database on another machine and makes requests to the web service. The web service makes JMS requests to other machines, aggregates the data, and returns it.

At our biggest client, about once a month the JBoss Java process takes 100% of all CPUs. The machine running JBoss has 8 CPUs. Our web app is still accessible during this time, however pages take about 3 minutes to load. Restarting JBoss restores everything to normal.

The database machine and all the other machines are fine, only the machine running JBoss is affected. Memory usage is normal. Network utilization is normal. There are no suspect error messages in the JBoss logs.

I have set up a test environment as close as possible to the client's production environment and I've done load testing with as much as 2x the number of concurrent users. I have not gotten my test environment to replicate the problem.

Where do we go from here? How can we narrow down the problem?

Currently the only plan we have is to wait until the problem occurs in production on its own, then do some debugging to determine the cause. So far people have just restarted JBoss when the problem occurred to minimize down time. Next time it happens they will get a developer to take a look. The question is, next time it happens, what can be done to determine the cause?

We could setup a separate JBoss instance on the same box and install the web app separately from the web service. This way when the problem next occurs we will know which WAR has the problem (assuming it is our code). This doesn't narrow it down much though.

Should I enable JMX remote? This way the next time the problem occurs I can connect with VisualVM and see which threads are taking the CPU and what the hell they are doing. However, is there a significant down side to enabling JMX remote in a production environment?

Is there another way to see what threads are eating the CPU and to get a stacktrace to see what they are doing?

Any other ideas?

Thanks!

like image 228
NateS Avatar asked Mar 15 '10 19:03

NateS


2 Answers

There's a quick and dirty way of identifying which threads are using up the CPU time on JBoss. Go the the JMX Console with a browser (usually on http://localhost:8080/jmx-console, but may be different for you), look for a bean called ServerInfo, it has an operation called listThreadCpuUtilization which dumps the actual CPU time used by each active thread, in a nice tabular format. If there's one misbehaving, it usually stands out like a sore thumb.

There's also the listThreadDump operation which dumps the stack for every thread to the browser.

Not as good as a profiler, but a much easier way to get the basic information. For production servers, where it's often bad news to connect a profiler, it's very handy.

like image 160
skaffman Avatar answered Sep 19 '22 23:09

skaffman


This typically happens with runaway code or unsafe thread access to hashmaps. A simple thread dump (kill -3, as @disown says, or ctrl-break in a windows console) will reveal this problem.

Since you're unable to reproduce it using tests I think it smells like a concurrency issue; it's usually hard to make test scripts behave sufficiently random to catch issues of this type.

I normally try to make it standard operating procedure to do thread-dumps of any JVM that is restarted due to operational anomalies, and it's really a requirement to catch those once-a-month things.

like image 24
krosenvold Avatar answered Sep 20 '22 23:09

krosenvold