Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Apache NIFI for ETL

How effective is to use Apache NIFI for the ETL process having source as HDFS & destination as Oracle DB. What are the limitations of Apache NIFI compared other ETL tools such as Pentaho,Datastage,etc..

like image 208
arul lal Divakar Avatar asked Nov 07 '22 18:11

arul lal Divakar


1 Answers

Main advantage of NiFi

The main advantages of NiFi:

  1. Intuitive gui, which allows for easy inspection of the data
  2. Strong delivery guarantees
  3. Low latency, you can support both batch and streaming usecases
  4. It can handle any format, not only limited to SQL tables, but can also move log files etc.
  5. Schema aware, and can share schema with solutions like Kafka, Flink, Spark

Main limitation of NiFi

NiFi is really a tool for moving data around, you can do enrichments of individual records but it is typically mentioned to do 'EtL' with a small t. A typical thing that you would not want to do in NiFi is joining two dynamic data sources.

For joining tables, tools like Spark, Hive, or classical ETL alternatives are often used.

For joining streams, tools like Flink and Spark Streaming are often used.

Conclusion

NiFi is a great tool, you just need to make sure you use it for the right usecase. Where needed you can use other tools to complement it.


Extra strong full disclosure: I am an employee of Cloudera, the company that supports NiFi and other projects such as Spark and Flink. I have used other ETL tools before, but not to the same extent as NiFi.

like image 152
Dennis Jaheruddin Avatar answered Nov 14 '22 23:11

Dennis Jaheruddin