Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Difference Between Processor Properties and Flowfile Attributes in Apache NiFi

My current understanding is that NiFi processor properties are specific to that processor. So adding a new property to a processor will only be visible within that processor and not be passed on to later processor blocks?

This is why UpdateAttribute is necessary to add metadata that stays with the flowfile as it traverses through the data flow:

Update Attribute NiFi Processor Block

So what is the value in allowing the user to add custom properties in a processor beyond the ones defined and required for that processor to execute? Is it analogous to creating variables that can then be used in other properties?

Processor Block Properties

like image 667
Adam Avatar asked Jan 19 '19 18:01

Adam


People also ask

What is a FlowFile attribute?

A FlowFile is a logical notion that correlates a piece of data with a set of Attributes about that data. Such attributes include a FlowFile's unique identifier, as well as its name, size, and any number of other flow-specific values.

What is FlowFile repository in NiFi?

The FlowFile Repository acts as NiFi's Write-Ahead Log, so as the FlowFiles are flowing through the system, each change is logged in the FlowFile Repository before it happens as a transactional unit of work. This allows the system to know exactly what step the node is on when processing a piece of data.


1 Answers

A very good question and one that comes to everyone's mind when they start working on building data-flows in NiFi.

First things first: Properties vs FlowFile Attributes

As you yourself have mentioned in your question itself, Properties are something that are used to control the behavior of your Processor while Attributes are metadata of your flow-in-action.

A simple example, lets take GetFile processor. The properties it exposes like Input Directory, File Filter, etc., tell your processor where & how to look for the source data. When the processor successfully finds some source matching your configuration, it initiates the flow, meaning a FlowFile is generated. This FlowFile will carry the content of the source data plus some metadata of the source such as the name of the file, size of the file, last modified time, etc., This metadata can actually help you down the flow with your subsequent processors like checking the file's type and route the FlowFile accordingly. And mind you, the metadata are not fixed; it differs with the different processors.

There are few core attributes which every processor would add like application.type, filesize, uuid, path, etc.,

What is purpose of letting users add custom properties when they are not added to the attributes?

It is a feature that NiFi offers to processors which they can use or ignore. Not all processors allow custom properties to be added. Only selective processors do.

Let's take InvokeHttp as an example. This processor allows the developer to create custom properties. When a user adds a new custom property, that property is added as a header to the HTTP call which the processor is going to make because the processor is built that way. It looks for any dynamic (custom) properties. If they are present, it will be considered as custom header(s) the user wants to send.

At least, in this processor's context, it doesn't make sense to capture this header data as a metadata because it may not be useful for the subsequent processors but there are certain other processors that act differently when custom properties are provided, like UpdateAttribute whose sole purpose is add any custom property as an attribute to the incoming FlowFile.

like image 157
Sivaprasanna Sethuraman Avatar answered Sep 23 '22 08:09

Sivaprasanna Sethuraman