Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spark Streaming Kafka Stream batch execution

I'm new in spark streaming and I have a general question relating to its usage. I'm currently implementing an application which streams data from a Kafka topic.

Is it a common scenario to use the application to run a batch only one time, for example, an end of the day, collecting all the data from the topic, do some aggregation and transformation and so on?

That means after starting the app with spark-submit all this stuff will be performed in one batch and then the application would be shut down. Or is spark stream build for running endless and permanently stream data in continuous batches?

like image 967
Vik Avatar asked Dec 31 '25 13:12

Vik


1 Answers

You can use kafka-stream api, and fix a window-time to perform aggregation and transformation over events in your topic only one batch at a time. for move information about windowing check this https://kafka.apache.org/21/documentation/streams/developer-guide/dsl-api.html#windowing

like image 185
Mehdi Bahra Avatar answered Jan 03 '26 02:01

Mehdi Bahra



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!