Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How would I split a stream in Apache Storm?

Tags:

apache-storm

I am not understanding how I would split a stream in Apache Storm. For example, I have bolt A that after some computation has somevalue1, somevalue2, and somevalue3. It wants to send somevalue1 to bolt B, somevalue2 to bolt C, and somevalue1,somevalue2 to bolt D. How would I do this in Storm? What grouping would I use and what would my topology look like? Thank you in advance for your help.

like image 934
james Avatar asked Nov 06 '13 08:11

james


People also ask

What stream grouping method is present in Apache Storm?

There are eight built-in stream groupings in Storm, and you can implement a custom stream grouping by implementing the CustomStreamGrouping interface: Shuffle grouping: Tuples are randomly distributed across the bolt's tasks in a way such that each bolt is guaranteed to get an equal number of tuples.

What is topology in Apache Storm?

The Storm topology is basically a Thrift structure. TopologyBuilder class provides simple and easy methods to create complex topologies. The TopologyBuilder class has methods to set spout (setSpout) and to set bolt (setBolt). Finally, TopologyBuilder has createTopology to create topology.

Is responsible for transforming a stream in Storm?

Basically, a bolt is the processing powerhouse of a Storm topology and is responsible for transforming a stream.

What is Nimbus in Apache Storm?

The Nimbus node is the master in a Storm cluster. It is responsible for distributing the application code across various worker nodes, assigning tasks to different machines, monitoring tasks for any failures, and restarting them as and when required. Nimbus is stateless and stores all of its data in ZooKeeper.


2 Answers

You can use different streams if your case needs that, it is not really splitting, but you will have a lot of flexibility, you could use it for content based routing from a bolt for instance:

You declare the stream in the bolt:

@Override public void declareOutputFields(final OutputFieldsDeclarer outputFieldsDeclarer) {     outputFieldsDeclarer.declareStream("stream1", new Fields("field1"));     outputFieldsDeclarer.declareStream("stream2", new Fields("field1")); } 

You emit from the bolt on the chosen stream:

collector.emit("stream1", new Values("field1Value")); 

You listen to the correct stream through the topology

builder.setBolt("myBolt1", new MyBolt1()).shuffleGrouping("boltWithStreams", "stream1"); builder.setBolt("myBolt2", new MyBolt2()).shuffleGrouping("boltWithStreams", "stream2"); 
like image 191
zenbeni Avatar answered Sep 29 '22 21:09

zenbeni


You have two options here: Stream Groups and "Direct Grouping". Depending on your requirements, one of them is going to serves you.

Have a look at WordCountTopology sample project to see whether that is what you are looking for. Otherwise, "Direct Grouping" is going to be a better alternative.

But again, picking a grouping strategy depends on your requirements.

like image 25
Chiron Avatar answered Sep 29 '22 20:09

Chiron