Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Execution flow of a storm program

Tags:

apache-storm

I am new in storm and trying to understand the flow of execution of different methods from spout to bolt . Like spout has different methods like

nextTuple()

open()

declareOutputFields()

activate()

deactivate()

and bolt has methods like

prepare()

execute()

cleanup()

declareOutputFields()

so can anyone tell me the sequence of execution of these methods ?

like image 714
u12345 Avatar asked Mar 11 '15 07:03

u12345


People also ask

How do you run a Storm topology locally?

First you create a LocalDRPC object. This object simulates a DRPC server in process, just like how LocalCluster simulates a Storm cluster in process. Then you create the LocalCluster to run the topology in local mode. LinearDRPCTopologyBuilder has separate methods for creating local topologies and remote topologies.

How does Apache storm work?

Apache Storm works for real-time data just as Hadoop works for batch processing of data (Batch processing is the opposite of real-time. In this, data is divided into batches, and each batch is processed. This isn't done in real-time.)

How data is stream flow Apache Storm?

Apache Storm: Apache Storm is a real-time message processing system, and you can edit or manipulate data in real-time. Storm pulls the data from Kafka and applies some required manipulation. It makes it easy to reliably process unbounded streams of data, doing real-time processing what Hadoop did for batch processing.

Which of the following method is used for performing transformation in Bolt Apache Storm?

The main method in bolts is the execute method which takes in as input a new tuple. Bolts emit new tuples using the OutputCollector object.


1 Answers

First, when your topology is started...

  1. Create Spouts and Bolts
  2. declareOutputFields
  3. Spouts/Bolts serialized and assigned to workers

Second, in each worker somewhere on the cluster...

  1. Spouts open and Bolts prepare (happens once)
  2. In a loop...
    • Spouts call ack, fail, and nextTuple
    • Bolts call execute

If your topology is deactivated...

  • Your spouts deactivate method will be called. When you activate the topology again then activate will be called.

If your topology is killed...

  • Spouts might have close called
  • Bolts might have cleanup called

Note:

There is no guarentee that close will be called, because the supervisor kill -9's worker processes on the cluster. source

like image 80
Kit Menke Avatar answered Nov 08 '22 12:11

Kit Menke