Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using KafkaSpout, ack-ing a tuple twice causes timeouts?

Tags:

apache-storm

My topology uses the default KafkaSpout implementation. In some very controlled testing, I noticed the spout was failing tuples even though none of my bolts were failing any tuples and I was certain all messages were being fully processed well within my configured timeout.

I also noticed that (due to some sub-classing structure with my bolts), one of my bolts was ack-ing tuples twice. When I fixed this, the spout stopped failing tuples.

Sorry that this is more than a sanity check than a question, but does this make sense? I don't see why ack-ing the same tuple instance twice would cause the Spout to register timeouts, but it seems it was in my case?

like image 424
ab11 Avatar asked Sep 19 '25 19:09

ab11


1 Answers

It does make sense.

Storm tracks all of the acks (direct and indirect) for a tuple emitted by a spout in an odd but effective manner. I'm not sure of the exact algorithm, but it entails repeatedly XOR'ing what was originally the spout-emitted tuple ID with the ID's of subsequent anchored tuple ID's. each of those subsequent ID's is XOR'ed twice - once when the tuple is anchored and once when the tuple is acked. When the results of an XOR is all zero's, then the assumption is that each anchor was matched by an ack and the original spout-emitted tuple has finished processing.

By ack'ing some tuples more than once, you made it seem that some of the spout-emitted tuples were not finished completely (because an odd number of XOR's will never zero out).

like image 140
Chris Gerken Avatar answered Sep 23 '25 06:09

Chris Gerken