Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sending Spark streaming metrics to open tsdb

How can I send metrics from my spark streaming job to open tsdb database? I am trying to use open tsdb as data source in Grafana. Can you please help me with some references where I can start.

I do see open tsdb reporter here which does similar job. How can I integrate the metrics from Spark streaming job to use this? Is there any easy options to do it.

like image 742
passionate Avatar asked Dec 05 '17 00:12

passionate


People also ask

How does Spark read streaming data?

Use readStream. format("socket") from Spark session object to read data from the socket and provide options host and port where you want to stream data from.

Can Spark process streaming data?

In fact, you can apply Spark's machine learning and graph processing algorithms on data streams. Internally, it works as follows. Spark Streaming receives live input data streams and divides the data into batches, which are then processed by the Spark engine to generate the final stream of results in batches.

What method should be used to read streaming data into a DataFrame?

readStream() . In R, with the read. stream() method. Similar to the read interface for creating static DataFrame, you can specify the details of the source – data format, schema, options, etc.

How can I monitor my Spark performance?

Apache Spark monitoring provides insight into the resource usage, job status, and performance of Spark Standalone clusters. The Cluster charts section provides all the information you need regarding Jobs, Stages, Messages, Workers, and Message processing.


1 Answers

One way to send the metrics to opentsdb is to use it's REST API. To use it, simply convert the metrics to JSON strings and then utilize the Apache Http Client library to send the data (it's in java and can therefore be used in scala). Example code can be found on github.


A more elegant solution would be to use the Spark metrics library and add a sink to the database. There has been a discussion on adding an OpenTSDB sink for the Spark metrics library, however, finally it was not added into Spark itself. The code is avaiable on github and should be possible to use. Unfortunalty the code is compatible on Spark 1.4.1, however, in worst case it should still be possible to get some indications of what is necessary to add.

like image 196
Shaido Avatar answered Sep 22 '22 08:09

Shaido