It's mentioned in the storm documentation, that storm replays tuple which processing has timed out. My question is if the storm do this automatically (without calling fail() on the origin spout) or is this rather responsibility of the origin spout to replay the tuple (the fail() is called and replay should be implemented inside or even somewhere externally)?
In order to have a proper replay on a timeout, you must anchor the tuple with an id when you emit it from the spout. When the timeout occurs, whatever you used as an anchor is returned to the fail method (fail(object anchorId)). Now you can use the anchorId of the failed/timedout tuple to replay or anything else you want to do with the timeout tuple. Each anchor id must be unique. An example of an anchor id is a database id. When you tuple fails, you can use the databse id to recreate your tuple and re-emit it. So to answer your question you must have your replay logic inside the fail and you can use the anchorId to recreate your tuple. Hope this info helps
From http://storm.apache.org/documentation/Guaranteeing-message-processing.html,
if the tuple times-out Storm will call the
fail
method on theSpout
So yes, fail
will be called.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With