Is global state with multiple workers possible in Flink?

Tags:

Everywhere in Flink docs I see that a state is individual to a map function and a worker. This seems to be powerful in a standalone approach, but what if Flink runs in a cluster ? Can Flink handle a global state where all workers could add data and query it ?

From Flink article on states :

For high throughput and low latency in this setting, network communications among tasks must be minimized. In Flink, network communication for stream processing only happens along the logical edges in the job’s operator graph (vertically), so that the stream data can be transferred from upstream to downstream operators.

However, there is no communication between the parallel instances of an operator (horizontally). To avoid such network communication, data locality is a key principle in Flink and strongly affects how state is stored and accessed.

227

asked Jan 31 '18 08:01

bachrc

1 Answers

I think that Flink only supports state on operators and state on Keyed streams, if you need some kind of global state, you have to store and recover data into some kind of database/file system/shared memory and mix that data with your stream.

Anyways, in my experiece, with a good processing pipeline design and partitioning your data in the right way, in most cases you should be able to apply divide and conquer algorithms or MapReduce strategies to archive your needs

If you introduce in your system some kind of global state, that global state could be a great bottleneck. So try to avoid it at all cost.

100

answered Sep 28 '22 08:09

diegoreico

Related questions
                            
                                How to log authentication result with OWIN Jwt Bearer Authentication
                            
                                Removing black edge artifacts from a transparent image
                            
                                How to enable a PowerShell script to return help text when using Get-Help or -?
                            
                                Failed to display Jupyter Widget of typeHBox; widgets JavaScript library missing?
                            
                                How to predict test data using a GAM with MRF smooth and neighborhood structure?
                            
                                Java: Is it possible to exclude taking something from a super class
                            
                                TFS 2012 to TFS 2018 Migration/Upgrade Path
                            
                                Populate string value in a map only if matches the threshold bytes
                            
                                Using protobuf CodedInputStream to read from byte[]
                            
                                React-reveal only working when scroll is stopped- chrome
                            
                                how to get the field name and value from a record dynamically
                            
                                conda config change default env directory

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With