Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Storm Bolt Database Connection

I am using Storm (java) with Cassandra.

One of my Bolts inserts data in to Cassandra. Is there any way to hold the connection to Cassandra open between instantiations of this bolt?

The write speed of my application is fast. The bolt need to run several times a second, and the performance is being hindered by the fact that it is connecting to Cassandra each time.

It would run a lot faster if I could have a static connection that was held open, but I am not sure to achieve this in storm.

To clarify the question:

what is the scope of a static connection in a storm topology?

Unlike other messaging systems which have workers where the "work" goes on in a loop or callback which can make use of a variable (maybe a static connection) outside this loop, storms bolts seem to be instantiated each time they are called and can not have parameters passed in to them, so how can I use the same connection to cassandra?

like image 679
girlcoder Avatar asked Jan 03 '14 11:01

girlcoder


1 Answers

Unlike other messaging systems which have workers where the "work" goes on in a loop or callback which can make use of a variable (maybe a static connection) outside this loop, storms bolts seem to be instantiated each time they are called and can not have parameters passed in to them

Its not exactly right to say that storm bolts get instantiated each time they called. For example the prepare method only get called during the initialization phase i.e only once. from the doc it says
it is Called when a task for this component is initialized within a worker on the cluster. It provides the bolt with the environment in which the bolt executes.

So the best bet would be to put the initialization code in the prepare or open (in case of spouts) method as they will be called when the tasks are starting. But you need make it thread safe as it will be called by every tasks concurrently in its own thread.

The execute(Tuple tuple) method on the other hand is actually responsible for processing the logic and called every time it receives a tuple from the corresponding spouts or bolts.(so this is actually what get called every single time the bolt runs)


The cleanup method is called when an IBolt is going to be shutdown, the documentation says

There is no guarentee that cleanup will be called, because the supervisor kill -9's worker processes on the cluster.The one context where cleanup is guaranteed to be called is when a topology is killed when running Storm in local mode

So its not true that you can't pass a variable to it, you can instantiate any instance variables with the prepare method and then use it during the processing.

Regarding the DB connection I am not exactly sure about your use cases as you have not put any code but maintaining a pool of resource sounds like a good choice to me.

like image 160
user2720864 Avatar answered Sep 19 '22 13:09

user2720864