Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Monitoring the size of the Netty event loop queues

We've implemented monitoring for the Netty event loop queues in order to understand issues with some of our Netty modules. The monitor uses the io.netty.util.concurrent.SingleThreadEventExecutor#pendingTasks method, which works for most modules, but for a module that handle a few thousand HTTP requests per second it seem to be hung, or very slow. I now realize that the docs strictly specify this can be an issue, and I feel pretty lame... so I'm looking for another way to implement this monitor.

You can see the old code here: https://github.com/outbrain/ob1k/blob/6364187b30cab5b79d64835131d9168c754f3c09/ob1k-core/src/main/java/com/outbrain/ob1k/common/metrics/NettyQueuesGaugeBuilder.java

  public static void registerQueueGauges(final MetricFactory factory, final EventLoopGroup elg, final String componentName) {

    int index = 0;
    for (final EventExecutor eventExecutor : elg) {
      if (eventExecutor instanceof SingleThreadEventExecutor) {
        final SingleThreadEventExecutor singleExecutor = (SingleThreadEventExecutor) eventExecutor;
        factory.registerGauge("EventLoopGroup-" + componentName, "EventLoop-" + index, new Gauge<Integer>() {
          @Override
          public Integer getValue() {
            return singleExecutor.pendingTasks();
          }
        });

        index++;
      }
    }
  }

My question is, is there a better way to monitor the queue sizes?

This can be quite a useful metric, as it can be used to understand latency, and also to be used for applying back-pressure in some cases.

like image 740
Eran Harel Avatar asked Oct 04 '15 12:10

Eran Harel


2 Answers

You'd probably need to track the changes as tasks as added and removed from the SingleThreadEventExecutor instances.

To do that you could create a class that wraps and/or extends SingleThreadEventExecutor. Then you'd have an java.util.concurrent.atomic.AtomicInteger that you'd call incrementAndGet() every time a new task is added and decrementAndGet() every time one is removed/finishes.

That AtomicInteger would then give you the current number of pending tasks. You could probably override pendingTasks() to use that value instead (though be careful there - I'm not 100% that wouldn't have side effects).

It would add a bit of overhead to every task being executed, but would make retrieving the number of pending tasks near constant speed.

The downside to this is of course that it's more invasive than what you are doing at the moment, as you'd need to configure your app to use different event executors.

NB. this is just a suggestion on how to work around the issue - I've not specifically done this with Netty. Though I've done this sort of thing with other code in the past.

like image 189
John Montgomery Avatar answered Nov 20 '22 00:11

John Montgomery


Now, in 2021, Netty uses JCTools queues internally and pendingTasks() execution is very fast (almost always constant-time), so even than javadoc still declares that this operation is slow, you can use it without any concerns. Previously the issue was that counting the elements in the queue was a linear operation, but after migration to JCTools library this problem disappeared.

like image 23
Roman Fedorov Avatar answered Nov 20 '22 00:11

Roman Fedorov