Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Updating/Refreshing side inputs data or passing some additional set of data that can be accessible in transformations while processing main input

I am trying to create a streaming dataflow pipeline. I need to access some additional data for processing main input received from pubsub. If I use side inputs then after some time period (like 1 day) I need to update the cached side input data. Or Is there any way to pass additional data to transformations in the form of list or map so that I can use third party cache manager to refresh this data.

Thanks.

like image 681
ganesh_patil Avatar asked Nov 23 '25 17:11

ganesh_patil


1 Answers

Here are some possibilities:

  1. If you can express the changes to your side input as an unbounded PCollection of changes - for example by subscribing to a change notification topic - then you should be able to add the updates to the PCollection that you are viewing as a side input, as long as you are tolerant of stale data. The update and caching of side inputs has no defined latency on updates, but will certainly be less than a day.

  2. If you do not have a change notification topic, you can still write your own UnboundedSource that Dataflow will poll for updates. This can be a bit more complex.

  3. Similar to an UnboundedSource, in Apache Beam (incubating) there is active work on supporting timers with callbacks in a DoFn, so you might like to follow BEAM-27.

  4. You could also drive your own cache on the incoming main input data.

Depending on other details, there may be other approaches. You may need to think about how to "wait" for the new value to be ready. For a particular window, any main input will wait for the side input to have a value for that window. But once the side input has one value, there is no more waiting, but just a best-effort eventual update.

like image 199
Kenn Knowles Avatar answered Nov 28 '25 01:11

Kenn Knowles



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!