I'm looking for API which allows for accessing Spark Streaming Statistics which are available in "Streaming" tab in history server.
I'm mainly interested in batch processing time value but it's not directly available via REST API at least according to documentation: https://spark.apache.org/docs/latest/monitoring.html#rest-api

Any ideas how to get various information like in "Streaming" tab or running job in history server?
There's a metrics endpoint available on the same port as the Spark UI on the driver node.
http://<host>:<sparkUI-port>/metrics/json/
Streaming-related metrics have a .StreamingMetrics in their name:
Sample from a local test job:
local-1498040220092.driver.printWriter.snb.StreamingMetrics.streaming.lastCompletedBatch_processingDelay: {
value: 30
},
local-1498040220092.driver.printWriter.snb.StreamingMetrics.streaming.lastCompletedBatch_processingEndTime: {
value: 1498124090031
},
local-1498040220092.driver.printWriter.snb.StreamingMetrics.streaming.lastCompletedBatch_processingStartTime: {
value: 1498124090001
},
local-1498040220092.driver.printWriter.snb.StreamingMetrics.streaming.lastCompletedBatch_schedulingDelay: {
value: 1
},
local-1498040220092.driver.printWriter.snb.StreamingMetrics.streaming.lastCompletedBatch_submissionTime: {
value: 1498124090000
},
local-1498040220092.driver.printWriter.snb.StreamingMetrics.streaming.lastCompletedBatch_totalDelay: {
value: 31
},
local-1498040220092.driver.printWriter.snb.StreamingMetrics.streaming.lastReceivedBatch_processingEndTime: {
value: 1498124090031
},
local-1498040220092.driver.printWriter.snb.StreamingMetrics.streaming.lastReceivedBatch_processingStartTime: {
value: 1498124090001
}
To get the processing time we need to diff local-StreamingMetrics.streaming.lastCompletedBatch_processingEndTime -
StreamingMetrics.streaming.lastCompletedBatch_processingStartTime
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With