Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to handle large lookup tables that update rarely in Apache Flink

Tags:

apache-flink

The pattern of processing data is that I have a stream of records that get enriched with some information A. The records are sharded by some ID. This information A depends on the current record, the result of the previous calculation, and a large lookup table. The lookup table doesn't change often and changes are minor. I know I can use mapWithState/flatMapWithState to do stateful computations. However, how should I handle the lookup table? The idiomatic way would be to also handle it as state (like A) but the size of the lookup table probably is horrible for performance/memory (e.g. when snapshotting)

I'm currently thinking of making it a shared resource protected by a reader/writer lock. Is there a better way of handling this kind of pattern?

like image 603
gvd Avatar asked Oct 19 '22 08:10

gvd


1 Answers

Right now the only possible way is by using state, as you mentioned. We are working on an alternative way of doing it. Here are some of the ideas we have: https://docs.google.com/document/d/1hIgxi2Zchww_5fWUHLoYiXwSBXjv-M5eOv-MKQYN3m4/edit?usp=sharing

like image 149
aljoscha Avatar answered Oct 21 '22 10:10

aljoscha