iI am using Apache Flink to process a stream of data and I need to share an index between all the nodes that process the input data. The index is getting updated by the nodes frequently.
I would like to know, is it a good practice, from the point of efficiency, to share the Dataset through Broadcast Variables?
Is broadcast variable will be updated in all nodes after each update or not?
Does Apache Flink intelligently update broadcast variables incrementally just for recent changes or not?
I think the solution lies in using stateful functions based on Flink's managed state descriptors. If the state isn't partitionable, set the parallelism to one for your operator.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With