We are trying to run a spark program using NiFi. This is the basic sample we tried to follow.
We have configured Apache-Livy server in 127.0.0.1:8998
.
ExecutiveSparkInteractive
processor is used to run sample Spark code.
val gdpDF = spark.read.json("gdp.json")
val gdpRDD = gdpDF.rdd
gdpRDD.count()
LivyController
is confiured for 127.0.0.1
port 8998
and Session Type : spark
.
When we run the processor we get following error :
Spark Session returned an error, sending the output JSON object as the flow file content to failure (after penalizing)
We just want to output the line count in JSON file. How to redirect it to flowfile?
NiFi User log :
2020-04-13 21:50:49,955 INFO [NiFi Web Server-85] org.apache.nifi.web.filter.RequestLogger Attempting request for (anonymous) GET http://localhost:9090/nifi-api/flow/controller/bulletins (source ip: 127.0.0.1)
NiFi app.log
ERROR [Timer-Driven Process Thread-3] o.a.n.p.livy.ExecuteSparkInteractive ExecuteSparkInteractive[id=9a338053-0173-1000-fbe9-e613558ad33b] Spark Session returned an error, sending the output JSON object as the flow file content to failure (after penalizing)
I have seen several people struggling with this example. I recommend following this example from the Cloudera Community (especially note part 2). https://community.cloudera.com/t5/Community-Articles/HDF-3-1-Executing-Apache-Spark-via-ExecuteSparkInteractive/ta-p/247772
The key points I would be concerned with:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With