Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Profiling Netty Performance

Tags:

java

linux

netty

I'm writing a Netty application. The application is running on a 64 bit eight core linux box

The Netty application is a simple router that accepts requests (incoming pipeline) reads some metadata from the request and forwards the data to a remote service (outgoing pipeline).

This remote service will return one or more responses to the outgoing pipeline. The Netty application will route the responses back to the originating client (the incoming pipeline)

There will be thousands of clients. There will be thousands of remote services.

I'm doing some small scale testing (ten clients, ten remotes services) and I don't see the sub 10 millisecond performance I'm expecting at a 99.9 percentile. I'm measuring latency from both client side and server side.

I'm using a fully async protocol that is similar to SPDY. I capture the time (I just use System.nanoTime()) when we process the first byte in the FrameDecoder. I stop the timer just before we call channel.write(). I am measuring sub-millisecond time (99.9 percentile) from the incoming pipeline to the outgoing pipeline and vice versa.

I also measured the time from the first byte in the FrameDecoder to when a ChannelFutureListener callback was invoked on the (above) message.write(). The time was a high tens of milliseconds (99.9 percentile) but I had trouble convincing myself that this was useful data.

My initial thought was that we had some slow clients. I watched channel.isWritable() and logged when this returned false. This method did not return false under normal conditions

Some facts:

  • We are using the NIO factories. We have not customized the worker size
  • We have disabled Nagel (tcpNoDelay=true)
  • We have enabled keep alive (keepAlive=true)
  • CPU is idle 90+% of the time
  • Network is idle
  • The GC (CMS) is being invoked every 100 seconds or so for a very short amount of time

Is there a debugging technique that I could follow to determine why my Netty application is not running as fast as I believe it should?

It feels like channel.write() adds the message to a queue and we (application developers using Netty) don't have transparency into this queue. I don't know if the queue is a Netty queue, an OS queue, a network card queue or what. Anyway I'm reviewing examples of existing applications and I don't see any anti-patterns I'm following

Thanks for any help/insight

like image 674
Jake Carr Avatar asked Jan 30 '13 20:01

Jake Carr


1 Answers

Netty creates Runtime.getRuntime().availableProcessors() * 2 workers by default. 16 in your case. That means you can handle up to 16 channels simultaneously, other channels will wait untils you release the ChannelUpstreamHandler.handleUpstream/SimpleChannelHandler.messageReceived handlers, so don't do heavy operations in these (IO) threads, otherwise you can stuck the other channels.

like image 167
IgorL Avatar answered Nov 01 '22 18:11

IgorL