Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why Apache Airflow is not a data streaming solution

I know that Batch processing relies on collection of data, Stream processing relies on continuous data.

Please, explain me in simply words, why Apache Airflow is not a data streaming solution, but a batch processing.

like image 331
Arman Malkhasyan Avatar asked Sep 16 '25 22:09

Arman Malkhasyan


1 Answers

Airflow is not a data processing solution at all: stream or batch. Airflow is a "platform to programmatically author, schedule and monitor workflows"

If you want to build data processing workflow, you should delegate all calculations to data processing tools, such as Apache Spark. So, Airflow does not have its own limitations (as well as opportunities) to process data in streaming or batching ways

But you may notice that streaming workflows are more difficult to coordinate with Airflow. Workflows in Airflow are written as directed graphs: after one statement completes, execution moves to the next. In the case of stream processing, there is no moment of "completion": all processes works continuously and parallel

Summarizing. You can use Airflow to "coordinate" stream processing, but you won't get any benefit from using it

like image 86
Makrushin Evgenii Avatar answered Sep 19 '25 07:09

Makrushin Evgenii