Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Delayed queue / message processing in Storm

In my Storm topology, while processing a stream, I want to delay the processing of some messages until some future points in time. What are some reasonable options for doing this?

So far, I have thought about the following:

  • Using Java's Thread.sleep. (However, based on some discussions, this is not a recommended way to efficiently utilize Storm's resources.)
  • Use a delayed queue...
    • In particular, try java.util.concurrent.DelayQueue.
    • Are there other implementations worth trying?
  • Does Storm have some API for delaying a message that I have overlooked?
  • Does ZeroMQ provide a delayed messaging API that Storm (if modified) could take advantage of?
like image 584
David J. Avatar asked May 16 '13 17:05

David J.


2 Answers

We are using topology tick tuples to process pending tuples in bulk. It basically just stores them in memory on every normal tuple and when it receives a tick tuple it processes them into storage/indexing using bulk/pipelined processing.

We also use redis in cases where we have enormous spikes in volume, if a volume spike detected all tuples redirect to local redis storage on each of the hosts and then get pushed back into topology processing after volume dies down. Our situation might not be applicable to yours, just my 2c.

like image 80
iouri Avatar answered Sep 19 '22 12:09

iouri


Use an external message queue to implement a time-delay queue.

Since Storm is fault-tolerant and horizontally distributed, it would make sense to pick a message queue that fits that style, such as:

  • Kafka
  • Amazon SQS
  • RabbitMQ
like image 35
David J. Avatar answered Sep 18 '22 12:09

David J.