My current understanding is that NiFi processor properties are specific to that processor. So adding a new property to a processor will only be visible within that processor and not be passed on to later processor blocks?
This is why UpdateAttribute
is necessary to add metadata that stays with the flowfile as it traverses through the data flow:
So what is the value in allowing the user to add custom properties in a processor beyond the ones defined and required for that processor to execute? Is it analogous to creating variables that can then be used in other properties?
A FlowFile is a logical notion that correlates a piece of data with a set of Attributes about that data. Such attributes include a FlowFile's unique identifier, as well as its name, size, and any number of other flow-specific values.
The FlowFile Repository acts as NiFi's Write-Ahead Log, so as the FlowFiles are flowing through the system, each change is logged in the FlowFile Repository before it happens as a transactional unit of work. This allows the system to know exactly what step the node is on when processing a piece of data.
A very good question and one that comes to everyone's mind when they start working on building data-flows in NiFi.
First things first: Properties vs FlowFile Attributes
As you yourself have mentioned in your question itself, Properties
are something that are used to control the behavior of your Processor
while Attributes
are metadata of your flow-in-action.
A simple example, lets take GetFile
processor. The properties it exposes like Input Directory
, File Filter
, etc., tell your processor where & how to look for the source data. When the processor successfully finds some source matching your configuration, it initiates the flow, meaning a FlowFile
is generated. This FlowFile will carry the content of the source data plus some metadata of the source such as the name of the file, size of the file, last modified time, etc., This metadata can actually help you down the flow with your subsequent processors like checking the file's type and route the FlowFile accordingly. And mind you, the metadata are not fixed; it differs with the different processors.
There are few core attributes which every processor would add like application.type
, filesize
, uuid
, path
, etc.,
What is purpose of letting users add custom properties when they are not added to the attributes?
It is a feature that NiFi offers to processors which they can use or ignore. Not all processors allow custom properties to be added. Only selective processors do.
Let's take InvokeHttp
as an example. This processor allows the developer to create custom properties. When a user adds a new custom property, that property is added as a header to the HTTP call which the processor is going to make because the processor is built that way. It looks for any dynamic (custom) properties. If they are present, it will be considered as custom header(s) the user wants to send.
At least, in this processor's context, it doesn't make sense to capture this header data as a metadata because it may not be useful for the subsequent processors but there are certain other processors that act differently when custom properties are provided, like UpdateAttribute
whose sole purpose is add any custom property as an attribute to the incoming FlowFile.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With