I know that Batch processing relies on collection of data, Stream processing relies on continuous data.
Please, explain me in simply words, why Apache Airflow is not a data streaming solution, but a batch processing.
Airflow is not a data processing solution at all: stream or batch. Airflow is a "platform to programmatically author, schedule and monitor workflows"
If you want to build data processing workflow, you should delegate all calculations to data processing tools, such as Apache Spark. So, Airflow does not have its own limitations (as well as opportunities) to process data in streaming or batching ways
But you may notice that streaming workflows are more difficult to coordinate with Airflow. Workflows in Airflow are written as directed graphs: after one statement completes, execution moves to the next. In the case of stream processing, there is no moment of "completion": all processes works continuously and parallel
Summarizing. You can use Airflow to "coordinate" stream processing, but you won't get any benefit from using it
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With