While starting to learn streaming processing, I hear the following two technical items: stateful streaming processing, and stateless streaming processing, what are the difference between them? I heard storm is stateless while storm trident is stateful, so in practice, where to use storm and where to use storm trident?
In general, stateful stream processing is an application design pattern for processing an unbounded stream of events. Stateful stream processing means a “State” is shared between events(stream entities). And therefore past events can influence the way the current events are processed.
Stateless transformations do not require state for processing and they do not require a state store associated with the stream processor. Kafka 0.11. 0 and later allows you to materialize the result from a stateless IKTable transformation.
Stateless means there is no record of previous interactions and each interaction request has to be handled based entirely on information that comes with it. Stateful and stateless are derived from the usage of state as a set of conditions at a moment in time.
Most IT companies that build Microservices, already creating Stateless applications using REST API design. Understanding this concept is the foundation on which most modern architectures are based on, such as concepts such as RESTful design.
The difference between the two is, at a very high level, in the kind of operation you have to perform on them.
Some operations are stateless, that is, you process a record at a time. Think of a bank teller, that processes a stream of customers, one at a time. Each customer is a new unit of work that does not depend on the previous.
A stateful operation is like hiring a new employee. You have a stream of people coming for interviews, but if you hire them or not, depends on your state, that is, what positions you have open.
For example, let's say you're processing web logs. If you want to know how many users are looking at a page per second, your processing is almost stateless: every second you calculate how many users came per page. Each new second, you don't care about the result of the previous second. That is a stateless operation.
Let's say that instead you want to calculate a forecast of how many users you'll have in the next second. You want to average the last 10 minutes, so you need to keep a queue with the last 10 * 60 seconds - that's the state you need to keep for your processing, and you need to update it every second, to keep the most recent 10 minutes of state. That's of course a stateful operation. A simpler stateful operation is just counting the total number of page view since the beginning of the site.
One critical difference between the two operations is that if the stream stops and you reset the system, you gotta take care of saving the state. A stateless operation does not have any state to save so it's generally simpler.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With