Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How Can Apache NiFi Flow Be Tested?

We have started using NiFi for a lot of our data pipeline jobs. One of things which is challenging in Nifi is to do regression testing of the changes to the flows.

What are the common ways to handle unit and functional testing of NiFi flows? Are there any frameworks?

like image 824
SunilS Avatar asked Jul 16 '19 17:07

SunilS


People also ask

How do I check my NiFi performance?

For performance, you should be able to get a good idea of the performance by looking at the various statistics in NiFI, there are stats on each processor, process groups, and from the global menu Summary page, they all show things like FlowFiles in/out and bytes in/out.

What is NiFi DataFlow?

Introduction. Apache NiFi is a dataflow system based on the concepts of flow-based programming. It supports powerful and scalable directed graphs of data routing, transformation, and system mediation logic. NiFi has a web-based user interface for design, control, feedback, and monitoring of dataflows.

How much data can NiFi handle?

Here, NiFi handles the data at an impressive rate of 9.56 TB (42.4 billion messages) per 5 minutes, or 32.6 GB/sec (141.3 million events per second).

Is NiFi real-time?

Apache NiFi for DataFlow and Real-Time Streaming with Apache KAFKA. Apache NiFi as Flow based Programming platform.


1 Answers

There is a lot that can be written on this topic, but I'll try to keep it focused and brief.

  • Unit testing
    • The NiFi framework comes with extensive testing utilities for the framework itself as well as individual processors. You can examine the test code of any bundled processor to see common test patterns (testing a specific logic method vs. testing the execution of arbitrary flowfiles through the TestRunner mock execution). Many mock classes and services are available to streamline these tests. Example: TestEncryptContent
    • Groovy unit testing and Spock are also supported as test frameworks to allow for descriptive scenarios. Example: StandardHttpResponseMapperSpec
  • Integration testing
    • You can also build dynamic flows in test code (i.e. configure multiple processors and connections) and then pass in arbitrary data to evaluate behavior. Building the flow programmatically may take some time at first, but once complete, you'll have a repeatable flow definition you can use with many different input characteristics. Example: ITestHandleHttpRequest
    • You can test the application of variables, etc. on process groups. Example: StandardProcessGroupIT
    • You can use Docker containers to test dependent services like MongoDB, etc. Some OS-integration features are tested with containers using TestContainers. Example: ShellUserGroupProviderIT
  • Smoke testing
    • You can have a special bucket in your NiFi Registry which contains "test flows" used to establish baselines on a new/upgraded NiFi instance. Perhaps one flow tries to exhaust memory, another network, another CPU via heavy processing, etc. You can deploy these versioned flows onto a new system and run them to determine performance in common known scenarios.
    • You can replay specific flowfiles through a flow after modifying it to gather more information during flow development, tighten the feedback loop, and verify expected behavior. NiFi User Guide - Replaying a Flowfile
    • You can use GenerateFlowFile to mock static or dynamic flowfile content and attributes, which you can feed into a process group where the "flow under test" is deployed. From the FUT's perspective, this is no different from a production scenario. When the flow is updated, the same GFF can be used to "verify" the new behavior, and then it can be disabled and the "production" input connection can be dragged onto the same Input Port. More examples in my presentation BYOP: Custom Processor Development with Apache NiFi (slides)
like image 175
Andy Avatar answered Oct 01 '22 23:10

Andy