Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Apache NiFi For Importing Data From RDMBS to HDFS - Performance Comparison with SQOOP

Tags:

apache-nifi

We are exploring Apache NiFi as a general purpose data ingestion tool for our enterprise requirements.

One typical data ingestion requirement is moving data from RDBMS systems to HDFS.

I was able to build RDBMS to HDFS data movement flow in NiFi using GenerateTableFetch and ExecuteSQL Processors provided by NiFi and everything worked fine for smaller tables.

enter image description here

But, I couldn't test the flow for bigger tables as I was using a standalone distribution.

Has anyone done a performance comparison of NiFi with SQOOP for similar requirements ?

like image 434
Akhil Avatar asked Dec 03 '25 16:12

Akhil


1 Answers

ExecuteSQL and ExecuteSQLRecord are a better choice. The former will just automatically convert result sets into an Avro sequence. The latter gives you more freedom on how you write the output (JSON, CSV, etc.). One nice thing about ExecuteSQL is that you can chain it with MergeRecord to combine multiple modest-sized result pages into a much bigger block of data, and MergeRecord can use the ParquetWriter to give you ready-made Parquet for insertion into HDFS.

like image 103
Mike Thomsen Avatar answered Dec 06 '25 12:12

Mike Thomsen



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!