Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can Apache Storm spouts communicate with each other?

I have a directory which another process throws files into.

Our current implementation of Storm reads this directory and selects the oldest file and opens a reader to the file. This reader is held as a field within the spout so when nextTuple() is called, a single line is output from the file. Once the spout has finished reading it closes the reader and opens a new reader to a new file.

To increase the throughput an idea was to have multiple spouts reading multiple files at once, as these spouts will be fighting over the same files in the same directory, is there a way to communicate between spouts so they can negotiate on which files to read? (Or have an overall manager which allocates files to spouts).

The directory and files are stored and read from HDFS.

like image 244
Micky Avatar asked Oct 01 '14 10:10

Micky


People also ask

How does Apache Storm work?

Apache Storm works for real-time data just as Hadoop works for batch processing of data (Batch processing is the opposite of real-time. In this, data is divided into batches, and each batch is processed. This isn't done in real-time.)

What is difference between Spark and Storm?

Apache Storm and Spark are platforms for big data processing that work with real-time data streams. The core difference between the two technologies is in the way they handle data processing. Storm parallelizes task computation while Spark parallelizes data computations.

What are spouts in Apache Storm?

Spouts. A spout is a source of streams in a topology. Generally spouts will read tuples from an external source and emit them into the topology (e.g. a Kestrel queue or the Twitter API). Spouts can either be reliable or unreliable.

What are the benefits of using Apache Storm?

Apache Storm Benefits Storm is fault tolerant, flexible, reliable, and supports any programming language. Allows real-time stream processing. Storm is unbelievably fast because it has enormous power of processing the data. Storm can keep up the performance even under increasing load by adding resources linearly.


1 Answers

I think out of the box there is no way to make two spout communicate together. However, you should try https://github.com/ptgoetz/storm-signals

There is a BaseSignalSpout that relies on zookeeper to send messages between storm components.

Hope this help!

like image 89
fhussonnois Avatar answered Oct 23 '22 08:10

fhussonnois