Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Dynamically update topics list for spark kafka consumer

Is it possible to dynamically update topics list in spark-kafka consumer?

I have a Spark Streaming application which uses spark-kafka consumer. Say initially I have spark-kakfa consumer listening for topics: ["test"] and after a while my topics list got updated to ["test","testNew"]. now is there a way to update spark-kafka consumer topics list and ask spark-kafka consumer to consume data for updated list of topics without stopping sparkStreaming application or sparkStreaming context

like image 346
Rohith Yeravothula Avatar asked Nov 20 '22 17:11

Rohith Yeravothula


1 Answers

Is it possible to dynamically update topics list in spark-kafka consumer

No. Both the receiver and receiverless approaches are fixed once you initialize the kafka stream using KafkaUtils. There is no way for you to pass new topics as you go as the DAG is fixed.

If you want to read dynamically, perhaps consider a batch k job which is scheduled iteratively and can read the topics dynamically and creating an RDD out of that.

An additional solution would be to use a technology that gives you kore flexibility over the consumption, such as Akka Streams.

like image 115
Yuval Itzchakov Avatar answered Jan 19 '23 00:01

Yuval Itzchakov