Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Development Life Cycle for Apache NiFi

I realize that with NiFi, as their doc defines it, "continuous improvement occurs in production". So this doesn’t lend itself to be used as a traditional development tool. However for the project I’m working on it’s been decided that this is the tool we’ll be using, so I'd rather not debate the merits of this as I realize there are going to be some issues.

For example if I push changes into an existing environment (from staging to production) and there were live edits in the destination, they are going to get overwritten. So I have questions on how to organize the development life cycle.

  • Is it possible to merge changes which were done by multiple developers in parallel (merge exported xml template files)? I’m guessing merging any significant changes could be difficult but have not attempted it.
  • How to manage versioning changes? I’m assuming you could export your entire configuration as a template and check that into version control?
  • How to deploy a flow to a different server? Can you just deploy a stock NiFi deployment and then update it from your exported template (as mentioned above) using the NiFi REST API?
  • How to mange deploying to different environment that might have different configuration? Would you have to update the template XML file? Or can I pull it in dynamically from something like Zookeeper?
like image 360
Mike Avatar asked Jun 20 '16 21:06

Mike


People also ask

What does NiFi developer do?

Apache NiFi helps to manage and automate the flow of data between the systems. It can easily manage the data transfer between source and destination systems. It can be described as data logistics. Apache NiFi helps to move and track data similar to the parcel services as how data move and track.

What happens to be data if NiFi goes down?

The NiFi Documentation is up to date and explain nicely how clusters work. To answer your question in short, if a node fails, the data that was on that node when it failed will require manual intervention to recover. If you lose the storage on the failed node, you lose the data on that node.

What is a NiFi process?

Apache NiFi is a dataflow system based on the concepts of flow-based programming. It supports powerful and scalable directed graphs of data routing, transformation, and system mediation logic. NiFi has a web-based user interface for design, control, feedback, and monitoring of dataflows.

How does Apache NiFi work?

Apache NiFi is an integrated data logistics platform for automating the movement of data between disparate systems. It provides real-time control that makes it easy to manage the movement of data between any source and any destination.


1 Answers

As the original author of the item you quoted and a member of the Apache NiFi PMC let me start by saying you're asking great questions and I can appreciate where you're coming from. We should probably improve the introduction document to better reflect the concerns you're raising.

You have it right that the current approach is to create templates of the flows and then you can submit that to version control. It is also the case that folks automate the deployment of these templates using scripts interacting with NiFi's REST API. But we can and should do far more than we have to make the dataflow management job easier regardless of whether you're a developer writing precisely what will be deployed or whether you're an operations focused person having to put these pieces together yourself.

  1. Management and versioning of flows [1] should be easier and centrally managed to be shared across multiple clusters and environments [2].
  2. We need to make sure that environment specific values are easily mapped into a given environment but that the templates are still portable [3].
  3. We need to make the multi-user/multi-tenant user experience far more intuitive and natural [4].

Elements of 1 and 2 will be present in the upcoming 1.0 release and item 3 is totally covered in that upcoming release. In the mean time for the multi-developer case I think it makes sense for them to treat their own local instance as a place for 'unit testing' their flow and then using a shared staging or production environment. The key thing to keep in mind is that for many flows and with NiFi's approach it is ok to have multiple instances of a given flow template executing each being fed the live feed of data. The results/output of that flow can be wired to actually get delivered somewhere or simply be grounded. In this way it is a lot like the mental model of branching in source control such as Git. You get to choose which one you consider 'production' versus which flow on the graph is simply an ongoing feature branch if you will. For people coming from the more traditional approach this is not obvious and we need to do more to describe and demonstrate this. However, we should also support more traditional approaches as well and that is what some of the feature proposals I've linked to will enable.

[1] https://cwiki.apache.org/confluence/display/NIFI/Configuration+Management+of+Flows

[2] https://cwiki.apache.org/confluence/display/NIFI/Extension+Registry

[3] https://cwiki.apache.org/confluence/display/NIFI/Variable+Registry

[4] https://cwiki.apache.org/confluence/display/NIFI/Multi-Tentant+Dataflow

like image 104
Joe Witt Avatar answered Oct 27 '22 10:10

Joe Witt