Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why storm replays tuple from spout instead of retry on crashing component?

I am using storm to process online problems, but I cant't understand why storm replays tuple from spout . Retrying on what crashed may be more effective than replaying from root, right? Anyone can help me? Thx

like image 391
user1221244 Avatar asked Dec 26 '22 12:12

user1221244


1 Answers

A typical spout implementation will replay only the FAILED tuples. As explained here a tuple emitted from the spout can trigger thousands of others tuple and storm creates a tree of tuple based on that. Now a tuple is called "fully processed" when every message in the tree has been processed. While emitting spout add a message id which is used to identify the tuple in later phase. This is called anchoring and can be done in the following way

    _collector.emit(new Values("field1", "field2", 3) , msgId);

Now from the link posted above it says

A tuple is considered failed when its tree of messages fails to be fully processed within a specified timeout. This timeout can be configured on a topology-specific basis using the Config.TOPOLOGY_MESSAGE_TIMEOUT_SECS configuration and defaults to 30 seconds.

If the tuple times-out Storm will call the FAIL method on spout and likewise in case of success the ACK method will be called.

So at this point storm will let you know which are the tuple that it has been failed to process but if you look into the source code you will see that the implementation of the fail method is empty in the BaseRichSpout class, so you need to override BaseRichSpout's fail method in order to have replay capability in your application.

like image 154
user2720864 Avatar answered May 08 '23 13:05

user2720864