Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

JBoss threads waiting on random monitor

I'm doing some load testing on a web app deployed in JBoss. It starts up fine, but as the test ramps up and more simulated users start hitting JBoss, performance degrades severely:

Resposne time chart

Connecting VisualVM to it, I can see the threads were all fine, then suddenly started spending most of their time waiting for a monitor (green is running, red is monitor, yellow is wait):

Thread state graph

Running jstack, I see the threads are all waiting in the same place:

"http-0.0.0.0-8080-172" daemon prio=6 tid=0x000000005da90000 nid=0xd2c waiting for monitor entry [0x000000006cb4e000]
   java.lang.Thread.State: BLOCKED (on object monitor)
    at org.apache.log4j.Category.callAppenders(Category.java:185)
    - waiting to lock  (a org.apache.log4j.spi.RootCategory)
    at org.apache.log4j.Category.forcedLog(Category.java:372)
    at org.apache.log4j.Category.debug(Category.java:241)
    [my code]

Most of the ~200 HTTP processor threads are waiting for the same monitor. Looking at log4j.xml for the WAR, it has a single appender setup for CONSOLE. I delete the appender and try my test again. Same behavior, except jstack shows all the threads waiting in a different place:

"http-0.0.0.0-8080-251" daemon prio=6 tid=0x0000000059811800 nid=0x1108 waiting for monitor entry [0x0000000073ebe000]
   java.lang.Thread.State: BLOCKED (on object monitor)
    at java.util.Hashtable.get(Hashtable.java:333)
    - waiting to lock  (a org.jboss.util.property.PropertyMap)
    at java.util.Properties.getProperty(Properties.java:932)
    at org.jboss.util.property.PropertyMap.getProperty(PropertyMap.java:626)
    at java.lang.System.getProperty(System.java:653)
    at org.jaxen.saxpath.helpers.XPathReaderFactory.createReader(XPathReaderFactory.java:109)
    at org.jaxen.BaseXPath.(BaseXPath.java:124)
    at org.jaxen.BaseXPath.(BaseXPath.java:153)
    at nu.xom.JaxenConnector.(JaxenConnector.java:49)
    at nu.xom.Node.query(Node.java:424)
    [my code]

Changing nothing, I restart JBoss, run the test, then run jstack once it gets slow. All the threads are waiting in yet a different place:

"http-0.0.0.0-8080-171" daemon prio=6 tid=0x000000005d0d1000 nid=0x15d4 waiting for monitor entry [0x000000006cb4e000]
   java.lang.Thread.State: BLOCKED (on object monitor)
    at sun.nio.cs.FastCharsetProvider.charsetForName(FastCharsetProvider.java:118)
    - waiting to lock  (a sun.nio.cs.StandardCharsets)
    at java.nio.charset.Charset.lookup2(Charset.java:449)
    at java.nio.charset.Charset.lookup(Charset.java:437)
    at java.nio.charset.Charset.isSupported(Charset.java:479)
    at sun.nio.cs.StreamDecoder.forInputStreamReader(StreamDecoder.java:49)
    at java.io.InputStreamReader.(InputStreamReader.java:57)
    at java.io.FileReader.(FileReader.java:41)
    [my code]

What in the heck is going on? I have used jstack in the past and I tried running it when things are running fine and got the expected results. I assume jstack is fine. Any ideas what could cause such weird behavior? Any ideas on where to go from here?

like image 438
NateS Avatar asked Jun 29 '10 02:06

NateS


2 Answers

This sort of behaviour is to be expected. As you scale up a load test, you're always going to find bottlenecks, and in a complex system, those bottlenecks are going to shift around.

Your job is to identify those bottlenecks and try to fix them, one at a time. Each time you do, you'll always find another one, but hopefully the system will scale better each time. It's not easy, but then scaling for load isn't easy.

  • Take your 1st example. You have lots of calls to log4j's Logger.debug() method. Log4j doesn't perform well when logging under load, so you need to take some precautions. Even if your log4j config says "do not log DEBUG messages", log4j still has to do some work before realising this. The recommended approach to handle to is to wrap every Logger.debug() call in a if Logger.isDebugEnabled() { Logger.debug(); }` block. This should shift that particular bottleneck.

  • In your 2ndexample, you're calling XOM's Node.query() method. This method has to recompile the XPath expression on every call, and this seems to be a bottleneck. Find an API where you can pre-compile the XPath expression and re-use it.

  • In the 3rd example, you're reading a File. This isn't a good idea in a high-load system, file-io is slow when you're doing a large number of small operations. Consider re-implementing this to work a different way if you can.

All of these are unrelated, but all present performance bottlenecks you'll see when scaling for load. You'll never get rid of them all, but hopefully you can get it to a point where it's good enough.

like image 181
skaffman Avatar answered Nov 07 '22 09:11

skaffman


I set up the application in Tomcat running through Eclipse and did not see the problem. Eventually I found we were starting JBoss using a 32-bit Windows service wrapper, even though we were using a 64-bit JDK. The machine was 64-bit. I'm not sure how this would even work? At any rate, changing to a 32-bit JDK caused the crazy problem to go away and I was able to move on with my life.

like image 40
NateS Avatar answered Nov 07 '22 09:11

NateS