Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Apache Storm Bolt task is not receiving message after some time

We have a storm topology in which we configured one spout and two bolts. Spout queries data from DB continuously and send tuples it to first bolt for some processing. First bolt does some processing and send tuples it to second bolt which calls third party web service and sends data. So, what is happening after some time, last bolt is not getting any tuples and if we restart the topology it works fine. Only last bolt is in problem here. Other spout and first bolt are running fine, and I am not using acking framework. I have configured only one worker in this case`.

TopologyBuilder builder = new TopologyBuilder();
    builder.setSpout("messageListenrSpout", new MessageListenerSpout(), 1);
    builder.setBolt("processorBolt", new ProcessorBolt(), 20).shuffleGrouping("messageListenrSpout");
    builder.setBolt("notifierBolt", new NotifierBolt(),40).shuffleGrouping("processorBolt");
    Config conf = new Config();
        conf.put(Config.TOPOLOGY_SLEEP_SPOUT_WAIT_STRATEGY_TIME_MS, 10000);
        //conf.setMessageTimeoutSecs(600);
        conf.setDebug(true);
        StormSubmitter.submitTopology(TOPOLOGY, conf, builder.createTopology());
like image 735
Sumit Gupta Avatar asked Nov 01 '15 22:11

Sumit Gupta


People also ask

What is spout in Apache Storm?

Spouts. A spout is a source of streams in a topology. Generally spouts will read tuples from an external source and emit them into the topology (e.g. a Kestrel queue or the Twitter API). Spouts can either be reliable or unreliable.

What is tuple in Apache Storm?

The tuple is the main data structure in Storm. A tuple is a named list of values, where each value can be any type. Tuples are dynamically typed – the types of the fields do not need to be declared. Tuples have helper methods like getInteger and getString to get field values without having to cast the result.

Which of the following method of Storm topology is used to emit the generated data through the collector?

nextTuple − Emits the generated data through the collector. close − This method is called when a spout is going to shutdown.

How does Apache storm work?

Apache Storm – released by Twitter, is a distributed open-source framework that helps in the real-time processing of data. Apache Storm works for real-time data just as Hadoop works for batch processing of data (Batch processing is the opposite of real-time.


2 Answers

It's quite likely that you're having problems with a backlog of tuples causing timeouts. Try increasing the parallelism hint for the 2nd bolt since it sounds like that one's process time is much longer than that of the first bolt (that's why there would be a backlog into the 2nd bolt). If you're running this topology on the cluster look at the Storm UI to see the specifics.

like image 179
Chris Gerken Avatar answered Oct 10 '22 10:10

Chris Gerken


Guys when I was debugging my topology, I found that if let's say spout is sending message fast but bolt is processing slow. In this case, message will be queued up LMAX Disruptor Queue. Then spout task wait for that to be empty. If you take thread dump, you will find threads are in TIMED_WAITING state. So, we need to configure topology in such a way that its inflow and outflow maintained.

like image 1
Sumit Gupta Avatar answered Oct 10 '22 11:10

Sumit Gupta