Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Apache Storm: Track tuples by unique ID from Source Spout to Final Bolt

I want a method of uniquely identifying tuples throughout a whole Storm topology, so that each tuple can be tracked from Spout to the final Bolt.

The way I understand it is when passing a unique message id with an emit from a spout for example:

String msgID = UUID.randomUUID();
// emits a line from user tasks with msg id
outputCollector.emit(new Values(task), msgID);

This ID is somehow returned when acked to the Spout (Can this be simulated earlier to get back the passed Id at any point?). But the using of get message id on a tuple for example:

inputTuple.getMessageId()

This returns a new messageId not the one passed in at the Spout that is generated by the Tuple. Reference https://groups.google.com/forum/#!topic/storm-user/xBEqMDa-RZs

Questions

1) Is there a way to get the tuple.getMessageId() when the collector emits the Tuple.

2) Alternatively can the passed in messageId at the spout be got somehow from the tuple at any spout or bolt in the toplogy?

End Solution I want to be able to set an ID on a tuple when it is emitted, and then be able to identify that tuple again at any point in the Storm topology.

Or will the unique messageId that my system will track with have to be passed as a field/value on each output of each spout and bolt.

Thanks

like image 318
perkss Avatar asked Jul 19 '15 15:07

perkss


1 Answers

It is not possible to access the system generated IDs at the producer (only at the consumer via tuple.getMessageId(). In order to track tuples as you want it to do, you need to (following you own idea) add the ID as a regular field value to the tuple and copy it in each bolt to the corresponding output tuple(s).

like image 154
Matthias J. Sax Avatar answered Oct 14 '22 02:10

Matthias J. Sax