The pattern of processing data is that I have a stream of records that get enriched with some information A. The records are sharded by some ID. This information A depends on the current record, the result of the previous calculation, and a large lookup table. The lookup table doesn't change often and changes are minor. I know I can use mapWithState/flatMapWithState to do stateful computations. However, how should I handle the lookup table? The idiomatic way would be to also handle it as state (like A) but the size of the lookup table probably is horrible for performance/memory (e.g. when snapshotting)
I'm currently thinking of making it a shared resource protected by a reader/writer lock. Is there a better way of handling this kind of pattern?
Right now the only possible way is by using state, as you mentioned. We are working on an alternative way of doing it. Here are some of the ideas we have: https://docs.google.com/document/d/1hIgxi2Zchww_5fWUHLoYiXwSBXjv-M5eOv-MKQYN3m4/edit?usp=sharing
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With