Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the best way of sharing a dataset between the nodes in Apache flink?

iI am using Apache Flink to process a stream of data and I need to share an index between all the nodes that process the input data. The index is getting updated by the nodes frequently.

I would like to know, is it a good practice, from the point of efficiency, to share the Dataset through Broadcast Variables?

Is broadcast variable will be updated in all nodes after each update or not?

Does Apache Flink intelligently update broadcast variables incrementally just for recent changes or not?

like image 763
Ahmad.S Avatar asked Nov 09 '22 13:11

Ahmad.S


1 Answers

I think the solution lies in using stateful functions based on Flink's managed state descriptors. If the state isn't partitionable, set the parallelism to one for your operator.

like image 179
Eron Wright Avatar answered Dec 28 '22 14:12

Eron Wright