I wonder if it is possible in Flink to share the state among operators.
Say, for instance, that I have partitioning by key on an operator and I need a piece of state of partition A
inside partition C
(for any reason) (fig 1.a), or I need the state of operator C
in downstream operator F
(fig 1.b).
I know it is possible to broadcast
records to all partitions. So, if you include the internal state of an operator inside the records, you can share your internal state with downstream operators.
However, this could be an expensive operation instead of simply letting op1
specifically ask for op2
state.
Are the recent developments around queryable state moving towards this concept or they are meant only to let an external user query the internal state of the topology?
Thank you in advance for your insights
In general, Flink's design does not allow to read from or write to state of other subtasks of the same or different operators. As you said, you can use broadcast
to make state globally available. The queryable state features is intended for external user queries.
However, I heard of users who leveraged this features in an operator to fetch data from other operators of the same job. I don't know how well this works (stability and performance-wise). I would point you to the user mailing list for a more in-depth technical discussion if you would like to try this out.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With