This is a question I came up with after reading: What is the "task" in Storm parallelism
If I need to keep some information in the bolt's internal state, for example, in the classical word counting use case, keeping the count of each word seen in the bolt in a hashmap. After executing "rebalance" command, the task of the bolt many be moved to another executor, which may be in another JVM or even another machine. Will the bolt's internal state (the word count hashmap in this example) be transferred to the new environment (instance/JVM/machine)?
Of course putting the word count hashmap in a central place such as Zookeeper won't have this problem. But for performance sake, it seem we need to keep things in memory sometimes.
A nifty feature of Storm is that you can increase or decrease the number of worker processes and/or executors without being required to restart the cluster or the topology. The act of doing so is called rebalancing.
Apache Storm – released by Twitter, is a distributed open-source framework that helps in the real-time processing of data. Apache Storm works for real-time data just as Hadoop works for batch processing of data (Batch processing is the opposite of real-time.
Why use Apache Storm? Apache Storm is a free and open source distributed realtime computation system. Apache Storm makes it easy to reliably process unbounded streams of data, doing for realtime processing what Hadoop did for batch processing.
Bolt represents a node in the topology having the smallest processing logic and the output of a bolt can be emitted into another bolt as input. Storm keeps the topology always running, until you kill the topology. Apache Storm's main job is to run the topology and will run any number of topology at a given time.
Once you run a rebalance the following will happen
Here is a comment by Nathan Marz which should help clear your doubts.
Rebalance is equivalent to those workers being killed and being created from scratch on another machine. If you want "state" to be maintained, I suggest you use something like Trident and keep your state synced on a DFS
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With