Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

weblogic.socket.Muxer uses 100% cpu

We've recently started experiencing with deployments in Weblogic 12c using the weblogic.Deployer utility. We can deploy an EAR fine, but whenever we try to undeploy that application with the Managed Server still running it will start using 100% of our CPU (4-core Xeon, bare-metal).

After some tinkering and countless thread dumps, we could isolate the problem on 4 stuck threads. Each one of them consumed 100% on a core. The load average would jump from something around 0.10 to 4.00 in 5 minutes tops.

This is the threads that seems to be stuck:

"ExecuteThread: '3' for queue: 'weblogic.socket.Muxer'" daemon prio=10 tid=0x00007fb52801c800 nid=0x6bf0 runnable [0x00007fb58a0ad000]
   java.lang.Thread.State: RUNNABLE
        at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
        at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
        at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79)
        at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)
        - locked <0x00000000e18c66d0> (a sun.nio.ch.Util$2)
        - locked <0x00000000e18c66c0> (a java.util.Collections$UnmodifiableSet)
        - locked <0x00000000e18c6598> (a sun.nio.ch.EPollSelectorImpl)
        at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98)
        at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:102)
        at weblogic.socket.NIOSocketMuxer.selectFrom(NIOSocketMuxer.java:541)
        at weblogic.socket.NIOSocketMuxer.processSockets(NIOSocketMuxer.java:470)
        at weblogic.socket.SocketReaderRequest.run(SocketReaderRequest.java:30)
        at weblogic.socket.SocketReaderRequest.execute(SocketReaderRequest.java:43)
        at weblogic.kernel.ExecuteThread.execute(ExecuteThread.java:147)
        at weblogic.kernel.ExecuteThread.run(ExecuteThread.java:119)

I've seem many people with the same problem (not with Weblogic, though):

https://github.com/netty/netty/issues/327

https://issues.jboss.org/browse/XNIO-172

Why does select() consume so much CPU time in my program?

I don't think this could be happening because an old JDK version. java -version says:

java version "1.7.0_67"
Java(TM) SE Runtime Environment (build 1.7.0_67-b01)
Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode)

I googled a little bit but did not find anything on that. Do you WL experts know what could be the cause of this problem ?

Thanks a lot!

like image 859
Gustavo Ramos Avatar asked Feb 22 '15 20:02

Gustavo Ramos


3 Answers

I faced the same issue. I managed to solve it by using the following settings:

1. Using posix muxer :

set('MuxerClass', 'weblogic.socket.PosixSocketMuxer')

See Weblogic tunning

2. Add startup arguments:

-Djava.nio.channels.spi.SelectorProvider=sun.nio.ch.PollSelectorProvider -DUseSunHttpHandler=true
  • sun.nio.ch.PollSelectorProvider uses linux poll instead of epoll_wait

  • -DUseSunHttpHandler=true bypasses using weblogic http socket implementation

like image 55
Omar MEBARKI Avatar answered Nov 15 '22 20:11

Omar MEBARKI


After much tinkering, an almost sleepless night and googling till I bled, I'm almost sure I got it solved.

This solution is heavily based on another thread: https://stackoverflow.com/a/7827952/1484232

To summarize the whole shebang, GC threads collision (most likely) were causing the issues here. After applying some parameters to my VM, it was magically solved.

-XX:+UseConcMarkSweepGC 
-XX:+UseParNewGC 
-XX:ParallelCMSThreads=2 
-XX:+CMSParallelRemarkEnabled 
-XX:+CMSIncrementalMode 
-XX:+CMSIncrementalPacing 
-XX:CMSFullGCsBeforeCompaction=1 
-XX:+CMSClassUnloadingEnabled 
-XX:CMSInitiatingOccupancyFraction=80

If anyone ever has the same trouble, this can be used as a try to get things working again.

Cheers.

like image 42
Gustavo Ramos Avatar answered Nov 15 '22 21:11

Gustavo Ramos


This is a known issue with Weblogic 12c, and is published as the following Oracle Support document:

Performance Issue Due To weblogic.socket.NIOSocketMuxer Usage In WLS 12.1.2+ (Doc ID 2128032.1) (link)

The workaround provided is to switch to using a Native Muxer class, as described in the answer from Omar MEBARKI.

The article does not address any or the other workarounds mentioned in the other answers here.

like image 1
Mike Avatar answered Nov 15 '22 22:11

Mike