What are the reasons to configure more than one worker per cluster node in Apache Storm?

Question

In the following, I refer to this article: Understanding the Parallelism of a Storm Topology by Michael G. Noll

It seems to me that a worker process may host an arbitrary number of executors (threads) to run an arbitrary number of tasks (instances of topology components). Why should I configure more than one worker per cluster node?

The only reason I see is that a worker can only run a subset of at most one topology. Hence, if I want to run multiple topologies on the same cluster, I would need to configure the same number of workers per cluster node as the number of topologies to be run. (Example: This is because I would want to be flexible in case that some cluster nodes fail. If for example, only one cluster node remains I need at least as many worker processes as topologies running on that cluster in order to keep all topologies running.)

Is there any other reason? Especially, is there any reason to configure more than one worker per cluster node if running only one topology? (Better fail-safety, etc.)

danehammer · Accepted Answer

To balance the costs of a supervisor daemon per node, and the risk of impact of a worker crashing. If you have one large, monolithic worker JVM, one crash impacts everything running in that worker, as well as bad behaving portions of your worker impact more residents. But by having more than one worker per node, you make your supervisor more efficient, and now have a bulkhead pattern somewhat, keeping from the all or nothing approach.

The shared resources I refer to could be yours or storm's; several pieces of storm's architecture are shared per JVM, and could create contention problems. I refer to the receive and send threads, and underlying network pieces, specifically. Documented here.

What are the reasons to configure more than one worker per cluster node in Apache Storm?

Tags:

stream

parallel-processing

apache-storm

sema

1 Answers

danehammer

Recent Activity

Donate For Us

What are the reasons to configure more than one worker per cluster node in Apache Storm?

Tags:

stream

parallel-processing

apache-storm

sema

1 Answers

danehammer

Related questions

Recent Activity

Donate For Us