other Storm users:
The guidelines for setting up a storm cluster (https://github.com/nathanmarz/storm/wiki/Setting-up-a-Storm-cluster) indicate that the supervisor.slots.ports configuration property be set such that for every worker on a machine you allocate a a separate port.
My understanding is that each worker is a JVM instance that listens for commands from the nimbus controller.. So it makes sense that each one listen on a separate port.
However, there is also a method on backtype.storm.Config which seems to allow the number of workers to be defined. What if the call to setNumWorkers tries to set more workers than you have configured ports for ? That would seem to mess things up.
The only thing that makes sense to me is that the yaml configuration defines the upper bound on number of workers.. Each topology may request some workers be allocated to it. But if I submitted two topologies (to some particular cluster), each making the call Config.setNumWorkers(2), then I had better have four ports configured.
Is this the right idea ?
Thanks in advance .. -chris
Well, I think the upper bound guess was correct. I set up a one-machine storm cluster on my laptop, then i built ExclamationTopology (from storm-starter).. i set up only two workers, but ExclamationTopology has an invocation of > conf.setNumWorkers(3);
But, when i look at the storm UI it tells me 'Num Workers' is 2.
So it seems like what you set in the storm.yaml file is an upper bound, and if you ask for more workers than you have configured ports for, then you just get the max available.
(caveat: I'm just getting into this stuff, and am by no means an expert, so there's a chance i missed something.. But the above report is what I observed.)
You've basically got it right.
There is an important distinction between slots and workers. Slots are places where workers can be realized. When you set up a supervisor with, say, 10 slots, you are setting it up to run up to 10 workers simultaneously on that supervisor. If you request more workers than slots, Storm will do what it can to schedule the work in the available slots (in some cases this means, for example, that a worker may come in to a slot, do some work, and then be replaced by another worker so that a topology can continue), in some ways not to differently than an OS schedules processes to run on the limited number of "slots" (processors/cores/hyperthreads/whatever) it has available.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With