Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Combining/Updating Cassandra Queried data to Structured Streaming receieved from Kafka

I'm creating a Spark Structured streaming application which is going to be calculating data received from Kafka every 10 seconds.

To be able to do some of the calculations, I need to look up some information about sensors and placement in a Cassandra Database

I'm a little stuck at wrapping my head around how to keep the Cassandra data available throughout the cluster, and somehow update data from time to time, in-case we have done some changes to database table.

Currently, I'm querying the database as soon as I start the Spark locally using the Datastax Spark-Cassandra-connector

val cassandraSensorDf = spark
  .read
  .cassandraFormat("specifications", "sensors")
  .load

From here on I can use this cassandraSensorDs by joining it with my Structured Streaming Dataset.

.join(
   cassandraSensorDs ,
   sensorStateDf("plantKey") <=> cassandraSensorDf ("cassandraPlantKey")
)

How do I do additional queries to update this Cassandra data while having Structured Streaming Running? And how can I make the queried data available in a cluster setting?

like image 956
Martin Avatar asked Apr 16 '18 17:04

Martin


1 Answers

Using broadcast variables, you may write a wrapper to fetch data from Cassandra periodically and update a broadcast variable. Do a map-side join on the stream with the broadcast variable. I have not tested this approach and I think this might as well be an overkill depending on your use case(throughput).

How can I update a broadcast variable in spark streaming?

Another approach is to query Cassandra for every item in your stream, to optimise on the connections you should make sure that you use connection pooling and create only one connection for a JVM/partition. This approach is simpler you don't have to worry about warming the Cassandra data periodically.

spark-streaming and connection pool implementation

like image 194
Sudev Ambadi Avatar answered Oct 07 '22 17:10

Sudev Ambadi