How to update a large broadcast variable in a streaming use case?

Question

I have a use case where I have a streaming job running getting input data from kafka queue. And I have a reference data of 1 million rows which gets updated every hour. I load the reference data in the driver and then broadcast it to the workers. I would like to update this broadcast variable (in the driver) and resend it to workers.

What would be the best way to do this within spark, without introducing hbase/redis/cassandra etc?

And how reliable is this?

Do let me know if more information is needed. Thank you in advance. =)

What would be the best way to do this within spark, without introducing hbase/redis/cassandra etc?

And how reliable is this?

Do let me know if more information is needed. Thank you in advance. =)

Timofey Chernousov · Accepted Answer

Answer to the similar question was given later here: How can I update a broadcast variable in spark streaming?

In short, you will need to: "unpersist" broadcast variable, update, and rebroadcast it.

PS. formally this question is not a duplicate, because it was posted earlier.

How to update a large broadcast variable in a streaming use case?

Tags:

apache-spark

Subba Rao

1 Answers

Timofey Chernousov

Recent Activity

Donate For Us

How to update a large broadcast variable in a streaming use case?

Tags:

apache-spark

Subba Rao

1 Answers

Timofey Chernousov

Related questions

Recent Activity

Donate For Us