Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can I run a Time Series Database (TSDB) over Apache Spark?

I'm starting to learn about big data and Apache Spark and I have a doubt.

In the future I'll need to collect data from IoT and this data will come to me as time series data. I was reading about Time Series Databases (TSDB) and I have found some open-source options like Atlas, KairosDB, OpenTSDB, etc.

I actually need Apache Spark, so I want to know: can I use a Time Series Database over Apache Spark? Does it makes any sense? Please, remember that I'm very new to the concepts of big data, Apache Spark and all matters that I've talked in this question.

If I can run TSDB over Spark, how can I achieve that?

like image 442
Paladini Avatar asked Sep 11 '15 19:09

Paladini


1 Answers

I'm an OpenTSDB committer, I know this is an old question, but I wanted to answer. My suggestion would be to write your incoming data to OpenTSDB, assuming you just want to store the raw data and process it later. Then with Spark, execute OpenTSDB queries using the OpenTSDB classes.

You can write data with the classes also, I think you want to use the IncomingDataPoint construct, I actually don't have the details at hand at the moment. Feel free to contact me on the OpenTSDB mailing list for more questions.

You can see how OpenTSDB handles the incoming "put" request here, you should be able to do the same thing in your code for writes:

https://github.com/OpenTSDB/opentsdb/blob/master/src/tsd/PutDataPointRpc.java#L42

You can see the Splicer project submitting OpenTSDB queries here, a similar method could be used in your Spark project I think:

https://github.com/turn/splicer/blob/master/src/main/java/com/turn/splicer/tsdbutils/SplicerQueryRunner.java#L87

like image 112
Jonathan Creasy Avatar answered Oct 31 '22 03:10

Jonathan Creasy