We are exploring Apache NiFi as a general purpose data ingestion tool for our enterprise requirements.
One typical data ingestion requirement is moving data from RDBMS systems to HDFS.
I was able to build RDBMS to HDFS data movement flow in NiFi using GenerateTableFetch and ExecuteSQL Processors provided by NiFi and everything worked fine for smaller tables.

But, I couldn't test the flow for bigger tables as I was using a standalone distribution.
Has anyone done a performance comparison of NiFi with SQOOP for similar requirements ?
ExecuteSQL and ExecuteSQLRecord are a better choice. The former will just automatically convert result sets into an Avro sequence. The latter gives you more freedom on how you write the output (JSON, CSV, etc.). One nice thing about ExecuteSQL is that you can chain it with MergeRecord to combine multiple modest-sized result pages into a much bigger block of data, and MergeRecord can use the ParquetWriter to give you ready-made Parquet for insertion into HDFS.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With