Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Storm topology configuration

Tags:

apache-storm

How do you provide a custom configuration to a storm topology? For example, if I have a topology that I built that connects to a MySQL cluster and I want to be able to change which servers I need to to connect to without recompiling, how would I do that? My preference would be to use a config file, but my concern is that the file itself is not deployed to the cluster, therefore it won't be run (unless my understanding of how a cluster works is flawed). The only way I've seen so far to pass configuration options into a storm topology at runtime is via a command-line parameter, but that is messy when you get a good number of parameters.

One thought did have is to leverage a shell script to read the file into a variable and pass the contents of that variable in as a string to the topology, but I'd like something a little cleaner if possible.

Has anyone else encountered this? If so, how did you solve it?

EDIT:

It appears to need to provide more clarification. My scenario is that I have a topology that I want to be able to deploy in different environments without having to recompile it. Normally, I'd create a config file that contains things like database connection parameters and have that passed in. I'd like to know how to do something like that in Storm.

like image 719
blockcipher Avatar asked Aug 05 '13 14:08

blockcipher


People also ask

What is a topology in Storm?

Networks of spouts and bolts are packaged into a "topology" which is the top-level abstraction that you submit to Storm clusters for execution. A topology is a graph of stream transformations where each node is a spout or bolt.

What is Storm config?

Storm has a variety of configurations for tweaking the behavior of nimbus, supervisors, and running topologies. Some configurations are system configurations and cannot be modified on topology by topology basis, whereas other configurations can be modified per topology.


3 Answers

You can specify a configuration (via a yaml file typically) which you submit with your topology. How we manage this ourselves in our own project is we have separate config files for development and one for production, and inside it we store our server, redis and db IPs and Ports etc. Then when we run our command to build the jar and submit the topology to storm it includes the correct config file depending on your deployment environment. The bolts and spouts simply read the configuration they require from the stormConf map which is passed to them in your bolt's prepare() method.

From http://storm.apache.org/documentation/Configuration.html :

Every configuration has a default value defined in defaults.yaml in the Storm codebase. You can override these configurations by defining a storm.yaml in the classpath of Nimbus and the supervisors. Finally, you can define a topology-specific configuration that you submit along with your topology when using StormSubmitter. However, the topology-specific configuration can only override configs prefixed with "TOPOLOGY".

Storm 0.7.0 and onwards lets you override configuration on a per-bolt/per-spout basis.

You'll also see on http://nathanmarz.github.io/storm/doc/backtype/storm/StormSubmitter.html that submitJar and submitTopology is passed a map called conf.

Hope this gets you started.

like image 96
veroxii Avatar answered Nov 08 '22 11:11

veroxii


I solved this problem by just providing the config in code:

config.put(Config.TOPOLOGY_WORKER_CHILDOPTS, SOME_OPTS);

I tried to provide a topology-specific storm.yaml but it doesn't work. Correct me if you make it work to use a storm.yaml.

Update:
For anyone who wants to know what SOME_OPTS is, this is from Satish Duggana on the Storm mailing list:

Config.TOPOLOGY_WORKER_CHILDOPTS: Options which can override WORKER_CHILDOPTS for a topology. You can configure any java options like memory, gc etc

In your case it can be

config.put(Config.TOPOLOGY_WORKER_CHILDOPTS, "-Xmx1g");
like image 36
Adrian Liu Avatar answered Nov 08 '22 12:11

Adrian Liu


What might actually serve you best is to store the configuration in a mutable key value store (s3, redis, etc.) and then pull that in to configure a database connection that you then use (I assume here you are already planning to limit how often you talk to the database so that the overhead of getting this config is not a big deal). This design allows you to change the database connection on-the-fly, with no need to even redeploy the topology.

like image 23
G Gordon Worley III Avatar answered Nov 08 '22 12:11

G Gordon Worley III