I've been looking at Esper (and Storm) for stream processing.. Esper seems to do exactly what I want.. i.e. roling means, medians, complex queries, etc... but one thing has me wondering.
How would I scale out to multiple instances with Esper?
As far as I understand, Storm handles distributed processing, but with Esper you're on your own.
I wouldn't need to do it for the forseable future, but as we grow, so would our data volumes, would then need to scale out as well. Most likely we would be deployed in Amazon EC2.
Would I need to run multiple servers and shard data before sending them to my Esper application?
Is there a more graceful way of handling it?
-Sajal
You can run an Esper instance within a bolt, meaning that Storm will handle tuple/event federation, and Esper will handle the CEP on events it receives in a given bolt.
This has some code and information about embedding Esper in a Storm bolt: http://tomdzk.wordpress.com/2011/09/28/storm-esper/
However... You need to have a use case that supports relatively stateless Esper engines handling a subset of data.
For example: you are computing average daily temperature by city. If don't distribute your tuples using shuffleGrouping based on the city field, then each Esper bolt could have a different set of data per city.
Basically, be sure to read up on how data is distributed in a Storm topology before committing to this architecture.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With