Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Airbnb Airflow vs Apache Nifi [closed]

Are Airflow and Nifi perform the same job on workflows? What are the pro/con for each one? I need to read some json files, add more custom metadata to it and put it in a Kafka queue to be processed. I was able to do it in Nifi. I am still working on Airflow. I am trying to choose the best workflow engine for my project Thank you!

like image 690
CMPE Avatar asked Sep 08 '16 19:09

CMPE


People also ask

What is the difference between Apache NiFi and Airflow?

Summary. By nature, Airflow is an orchestration framework, not a data processing framework, whereas NiFi's primary goal is to automate data transfer between two systems. Thus, Airflow is more of a “Workflow Manager” area, and Apache NiFi belongs to the “Stream Processing” category.

Is Apache NiFi an ETL?

Apache NiFi is an ETL tool with flow-based programming that comes with a web UI built to provide an easy way (drag & drop) to handle data flow in real-time. It also supports powerful and scalable means of data routing and transformation, which can be run on a single server or in a clustered mode across many servers.

What are the limitations of Apache airflow?

Airflow has a strict dependency on a specific time: the execution_date . No DAG can run without an execution date, and no DAG can run twice for the same execution date.

Is NiFi a good tool?

Apache NiFi is considered one of the best open-source ETL tools because of its well-rounded architecture. It's a powerful and easy-to-use solution. FlowFile includes meta-information, so the tool's capabilities aren't limited to CSV. You can work with photos, videos, audio files, or binary data.


1 Answers

For a great overview of Airflow and Apache NiFi checkout this reddit post: https://www.reddit.com/r/bigdata/comments/51mgk6/comparing_airbnb_airflow_and_apache_nifi/

For your specific use-case of ingesting Json files, enriching them and routing them to Kafka I believe NiFi is the right tool for the job. A couple of processors you could potentially use, as well as documentation for each, are below:

GetFile: https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.9.2/org.apache.nifi.processors.standard.GetFile/index.html

JoltTransformJSON: https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.9.2/org.apache.nifi.processors.standard.JoltTransformJSON/index.html

PublishKafka (or PublishKafka_0_10 depending on your version): https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-kafka-0-9-nar/1.9.2/org.apache.nifi.processors.kafka.pubsub.PublishKafka/index.html

like image 74
JDP10101 Avatar answered Sep 20 '22 05:09

JDP10101