Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SelectorImpl is BLOCKED

I use a lot of client sends a request to the server about 1000 requests per second a client, the server's CPU soon rose to 600% (8 cores), and always maintain this state. When I use jstack printing process content, I found SelectorImpl is BLOCKED state. Records are as follows:

nioEventLoopGroup-4-1 prio=10 tid=0x00007fef28001800 nid=0x1dbf waiting for monitor entry [0x00007fef9eec7000]
java.lang.Thread.State: BLOCKED (on object monitor)
at sun.nio.ch.EPollSelectorImpl.doSelect(Unknown Source)
- waiting to lock <0x00000000c01f1af8> (a java.lang.Object)
    at sun.nio.ch.SelectorImpl.lockAndDoSelect(Unknown Source)
    - locked <0x00000000c01d9420> (a io.netty.channel.nio.SelectedSelectionKeySet)
    - locked <0x00000000c01f1948> (a java.util.Collections$UnmodifiableSet)
    - locked <0x00000000c01d92c0> (a sun.nio.ch.EPollSelectorImpl)
    at sun.nio.ch.SelectorImpl.select(Unknown Source)
    at io.netty.channel.nio.NioEventLoop.select(NioEventLoop.java:635)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:319)
    at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:101)
    at java.lang.Thread.run(Unknown Source)

High CPU has something to do with this? Another problem is that when I connect a lot of clients, find some client will connect, an error is as follows:

"nioEventLoopGroup-4-1" prio=10 tid=0x00007fef28001800 nid=0x1dbf waiting for monitor entry [0x00007fef9eec7000]
java.lang.Thread.State: BLOCKED (on object monitor)
at sun.nio.ch.EPollSelectorImpl.doSelect(Unknown Source)
- waiting to lock <0x00000000c01f1af8> (a java.lang.Object)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(Unknown Source)
- locked <0x00000000c01d9420> (a io.netty.channel.nio.SelectedSelectionKeySet)
- locked <0x00000000c01f1948> (a java.util.Collections$UnmodifiableSet)
- locked <0x00000000c01d92c0> (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(Unknown Source)
at io.netty.channel.nio.NioEventLoop.select(NioEventLoop.java:635)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:319)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:101)
at java.lang.Thread.run(Unknown Source)

Generate client is accomplished by using a thread pool, and has set up a connection timeout, but why frequent connection timeout? Is to serve the cause of the suit?

    public void run() {

    System.out.println(tnum + " connecting...");
    try {
        Bootstrap bootstrap = new Bootstrap();
        bootstrap.group(group)
        .channel(NioSocketChannel.class)
        .option(ChannelOption.CONNECT_TIMEOUT_MILLIS, 30000)
        .handler(loadClientInitializer);

        // Start the connection attempt.
        ChannelFuture future = bootstrap.connect(host, port);
        future.channel().attr(AttrNum).set(tnum);
        future.sync();
        if (future.isSuccess()) {
            System.out.println(tnum + " login success.");
            goSend(tnum, future.channel());
        } else {
            System.out.println(tnum + " login failed.");
        }
    } catch (Exception e) {
        XLog.error(e);
    } finally {

// group.shutdownGracefully(); }

}
like image 371
gdpencil Avatar asked Aug 08 '13 07:08

gdpencil


1 Answers

High CPU has something to do with this?

It might be. I'd diagnose this problem following way (on a Linux box):

Find threads which are eating CPU

Using pidstat I'd find which threads are eating CPU and in what mode (user/kernel) time is spent.

$ pidstat -p [java-process-pid] -tu 1 | awk '$9 > 50'

This command shows threads eating at least 50% of CPU time. You can inspect what those threads are doing using jstack, VisualVM or Java Flight Recorder.

If CPU-hungry threads and BLOCKED threads are the same, CPU usage seems to do something with contention.

Find reason for connection timeout

Basically you will get connection timeout if two OS'es can't finish TCP-handshake in a given time. Several reasons for this:

  • network link saturation. Can be diagnosed using sar -n DEV 1 and comparing rxkB/s and txkB/s columns to your link maximum throughput.
  • server (Netty) doesn't respond with accept() call in given timeout. This thread can be BLOCKED or starving for CPU time. You can find which threads are calling accept() (therefore finishing TCP-handshake) using strace -f -e trace=accept -p [java-pid]. And after that check for possible reasons using pidstat/jstack.

Also you can find number of received requests for connection open (but not confirmed) with netstat -an | grep -c SYN_RECV

like image 130
Denis Bazhenov Avatar answered Oct 11 '22 10:10

Denis Bazhenov