Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Apache Nifi - Consume Kafka + Merge Content + Put HDFS to avoid small files

I am having around 2000000 messages in Kafka topic and I want to put these records into HDFS using NiFi,so I am using PutHDFS processor for this along with ConsumeKafka_0_10 but it generates small files in HDFS, So I am using Merge Content processor for the merging the records before pushing the file. enter image description here Please help if the configuration needs changes This works fine for small number of messages but writes a single file for every record when it comes to topics with massive data.

Thank you!!

like image 983
BARATH Avatar asked Jul 18 '18 13:07

BARATH


1 Answers

The Minimum Number of Entries is set to 1 which means it could have anywhere from 1 to the Max Number of Entries. Try making that something higher like 100k.

like image 162
Bryan Bende Avatar answered Sep 22 '22 08:09

Bryan Bende