I am attempting to write a file on disk with the MergeContent processor, but I'm getting significantly varying file sizes - anywhere from one line to 806 lines. I've repeated the process many times over trying to figure out the newline demarcator as addressed in Apache NIFi MergeContent processor - set demarcator as new line and I've gotten really randomly sized files.
What parameters do I need to set to adhere to the following logic?
To fully document, I currently have the following attributes defined:
As you can see, I've set "Max Bin Age" to "10 sec" following the syntax in https://github.com/apache/nifi/blob/31fba6b3332978ca2f6a1d693f6053d719fb9daa/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/test/java/org/apache/nifi/processors/standard/TestMergeContent.java#L219 (which is the only place I've managed to find an example of this value, the documentation seems incomplete on this parameter)
I've set "Maximum Number of Entries" to 5000, and "Maximum number of Bins" to 1
What do I need to do to aggregate my records following the logic above? I also tried using the "Correlation Attribute Name" parameter with an attribute guaranteed to be identical on all documents reaching this point, and saw the same
The most important thing here is actually the minimum number of entries. What is happening is that the binning algorithm takes a lenient approach in terms of the number of items.
For your specific logic, you would want to let things as they stand and:
Below is an image of the configuration above where min and max bin size are both 5000 and only 1 bin is handled at a time. In this case you'll see that exactly 20000 files have been merged into 4.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With