Let's say I need to implemnt a custom sink using RichSinkFunction, and I need some variables like DBConnection in the sink. Where should I initialize the DBConnection? I see most of the articles init the DBConnection in the open() method, why not in the constructor?
A folow up questions is what kind of variables should be inited in constructor and what should be init in open()?
The constructor of a RichFunction is only invoked on client side. If something needs to be actually performed on the cluster, it should be done in open.
open also needs to be used if you want to access parameters to your Flink job or RuntimeContext (for state, counters, etc.). When you use open, you also want to use close in symmetric fashion.
So to answer your question: your DBConnection should be initialized in open only. In constructor, you usually just store job-constant parameters in fields, such as how to access the key of your records if your sink can be reused across multiple projects with different data structures.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With