I want to send Nifi flowfile to Spark and do some transformations in Spark and again send the result back to Nifi so that I can to further operations in Nifi. I don't want to write the flowfile written to database or HDFS and then trigger Spark job. I want to send flowfile directly to Spark and receive the result directly from Spark to Nifi. I tried using ExecuteSparkInteractive processor in Nifi but I am stuck. Any examples would be helpful
Apache NiFi supports powerful and scalable directed graphs of data routing, transformation, and system mediation logic. When paired with the CData JDBC Driver for Spark, NiFi can work with live Spark data. This article describes how to connect to and query Spark data from an Apache NiFi Flow.
NiFi offers highly configurable and secure data flow between software all around the world. Other features include data provenance, efficient data buffering, flow specific QoS, and parallel streaming capabilities. On the other hand, Spark speeds up the computation process, regardless of the language.
Apache NiFi is an ETL tool with flow-based programming that comes with a web UI built to provide an easy way (drag & drop) to handle data flow in real-time. It also supports powerful and scalable means of data routing and transformation, which can be run on a single server or in a clustered mode across many servers.
Apache Spark is a unified analytics engine for large-scale data processing with built-in modules for SQL, streaming, machine learning, and graph processing. Spark can run on Apache Hadoop, Apache Mesos, Kubernetes, on its own, in the cloud—and against diverse data sources.
You can't send data directly to spark unless it is spark streaming. If it is traditional Spark with batch execution, then Spark needs to read the data from some type of storage like HDFS. The purpose of ExecuteSparkInteractive is to trigger a Spark job to run on data that has been delivered to HDFS.
If you want to go the streaming route then there are two options...
1) Directly integrate NiFi with Spark streaming
https://blogs.apache.org/nifi/entry/stream_processing_nifi_and_spark
2) Use Kafka to integrate NiFi and Spark
NiFi writes to a Kafka topic, Spark reads from a Kafka topic, Spark writes back to a Kafka topic, NiFi reads from a Kafka topic. This approach would probably be the best option.
This might help :
you can do everything in Nifi by following below steps :-
Here, you need Livy setup to run spark code from Nifi (through ExecuteSparkINteractive). You may look at how to setup Livy and nifi controller services needed to use livy within Nifi.
Good Luck!!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With