How to handle large lookup tables that update rarely in Apache Flink

Question

The pattern of processing data is that I have a stream of records that get enriched with some information A. The records are sharded by some ID. This information A depends on the current record, the result of the previous calculation, and a large lookup table. The lookup table doesn't change often and changes are minor. I know I can use mapWithState/flatMapWithState to do stateful computations. However, how should I handle the lookup table? The idiomatic way would be to also handle it as state (like A) but the size of the lookup table probably is horrible for performance/memory (e.g. when snapshotting)

I'm currently thinking of making it a shared resource protected by a reader/writer lock. Is there a better way of handling this kind of pattern?

aljoscha · Accepted Answer

Right now the only possible way is by using state, as you mentioned. We are working on an alternative way of doing it. Here are some of the ideas we have: https://docs.google.com/document/d/1hIgxi2Zchww_5fWUHLoYiXwSBXjv-M5eOv-MKQYN3m4/edit?usp=sharing

How to handle large lookup tables that update rarely in Apache Flink

Tags:

apache-flink

gvd

1 Answers

aljoscha

Recent Activity

Donate For Us

How to handle large lookup tables that update rarely in Apache Flink

Tags:

apache-flink

gvd

1 Answers

aljoscha

Related questions

Recent Activity

Donate For Us