Storm topology configuration

Tags:

apache-storm

How do you provide a custom configuration to a storm topology? For example, if I have a topology that I built that connects to a MySQL cluster and I want to be able to change which servers I need to to connect to without recompiling, how would I do that? My preference would be to use a config file, but my concern is that the file itself is not deployed to the cluster, therefore it won't be run (unless my understanding of how a cluster works is flawed). The only way I've seen so far to pass configuration options into a storm topology at runtime is via a command-line parameter, but that is messy when you get a good number of parameters.

One thought did have is to leverage a shell script to read the file into a variable and pass the contents of that variable in as a string to the topology, but I'd like something a little cleaner if possible.

Has anyone else encountered this? If so, how did you solve it?

EDIT:

It appears to need to provide more clarification. My scenario is that I have a topology that I want to be able to deploy in different environments without having to recompile it. Normally, I'd create a config file that contains things like database connection parameters and have that passed in. I'd like to know how to do something like that in Storm.

719

asked Aug 05 '13 14:08

blockcipher

3 Answers

You can specify a configuration (via a yaml file typically) which you submit with your topology. How we manage this ourselves in our own project is we have separate config files for development and one for production, and inside it we store our server, redis and db IPs and Ports etc. Then when we run our command to build the jar and submit the topology to storm it includes the correct config file depending on your deployment environment. The bolts and spouts simply read the configuration they require from the stormConf map which is passed to them in your bolt's prepare() method.

From http://storm.apache.org/documentation/Configuration.html :

Every configuration has a default value defined in defaults.yaml in the Storm codebase. You can override these configurations by defining a storm.yaml in the classpath of Nimbus and the supervisors. Finally, you can define a topology-specific configuration that you submit along with your topology when using StormSubmitter. However, the topology-specific configuration can only override configs prefixed with "TOPOLOGY".

Storm 0.7.0 and onwards lets you override configuration on a per-bolt/per-spout basis.

You'll also see on http://nathanmarz.github.io/storm/doc/backtype/storm/StormSubmitter.html that submitJar and submitTopology is passed a map called conf.

Hope this gets you started.

answered Nov 08 '22 11:11

veroxii

I solved this problem by just providing the config in code:

config.put(Config.TOPOLOGY_WORKER_CHILDOPTS, SOME_OPTS);

I tried to provide a topology-specific storm.yaml but it doesn't work. Correct me if you make it work to use a storm.yaml.

Update:
For anyone who wants to know what SOME_OPTS is, this is from Satish Duggana on the Storm mailing list:

Config.TOPOLOGY_WORKER_CHILDOPTS: Options which can override WORKER_CHILDOPTS for a topology. You can configure any java options like memory, gc etc

In your case it can be

config.put(Config.TOPOLOGY_WORKER_CHILDOPTS, "-Xmx1g");

answered Nov 08 '22 12:11

Adrian Liu

What might actually serve you best is to store the configuration in a mutable key value store (s3, redis, etc.) and then pull that in to configure a database connection that you then use (I assume here you are already planning to limit how often you talk to the database so that the overhead of getting this config is not a big deal). This design allows you to change the database connection on-the-fly, with no need to even redeploy the topology.

answered Nov 08 '22 12:11

G Gordon Worley III

Related questions
                            
                                Storm Spout not getting Ack
                            
                                Storm-Kafka multiple spouts, how to share the load?
                            
                                How to programmatically kill a storm topology?
                            
                                Compare in-memory cluster computing systems
                            
                                Grouping in a simple aggregation storm topology
                            
                                Storm - Supervisors crashing on reboot
                            
                                Apache Storm vs Apache Samza vs Apache Spark [closed]
                            
                                setClass not found when running R script from command line
                            
                                stateful and stateless streaming processing
                            
                                What will cause zookeeper Client session timed out
                            
                                How to achieve multi-tenancy in the context of Kafka and storm?
                            
                                Setting up a docker / fig Mesos environment
                            
                                Storm UI: Difference between Execute and Process Latencies
                            
                                How to disable/turn off the logging feature from Storm
                            
                                Storm logviewer page not found
                            
                                Apache Storm: Could not find leader nimbus from seed hosts
                            
                                Monitor worker crashes in apache storm
                            
                                how to rapidly increment counters in Cassandra w/o staleness
                            
                                how to tune the parallelism hint in storm

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With