Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I pass a pandas dataframe in Nifi from processor to processor?

Using Nifi, I want to:

  1. run a Python script that exports a pandas dataframe
  2. send it e.g. through ExecuteStreamCommand to a variety of plug-and-play Python scripts that both input and output pandas dataframes, not knowing they are running via Nifi and which I cannot modify to use STDIN/STDOUT instead of pandas.
  3. pass the output dataframe on for further processing.

Is this possible? If so, how?

Put another way:

  1. First Script: flowfile -> pandas
  2. Many Scripts: do stuff with pandas
  3. Last Script: pandas -> flowfile
like image 530
Robbie Avatar asked Jan 18 '26 21:01

Robbie


1 Answers

NiFi's ExecuteScript supports Jython, which does not allow Python native libraries (pandas is a native library), so you cannot perform this action directly in NiFi. I'd recommend you write an encompassing Python shell script which performs the following actions and invoke it from NiFi using the ExecuteStreamCommand processor:

Python wrapper script:

  1. Accept input from STDIN (this will be the flowfile content)
    • You can also put flowfile attributes on the command line as arguments using the "Command Arguments" property of the ESC processor
  2. Convert the STDIN input to a pandas dataframe
  3. Pass the dataframe between the arbitrary Python scripts which will have no knowledge of NiFi
  4. Output the final dataframe as STDOUT

This will allow the incoming flowfile content to be sent to this wrapper script, all the internal modifications made using the included scripts, and then the output to be translated from STDOUT back to flowfile content.

like image 79
Andy Avatar answered Jan 20 '26 10:01

Andy