I'm trying to figure out the proper way to run Apache Flink on Dataproc and use Google Pub/Sub as a source/sink. When I create a Dataproc cluster, after applying flink initialization action to the most recent image 1.4, Flink 1.6.4 will be installed.
The problem is that flink-connector-gcp-pubsub is only available starting from Flink version 1.9.0.
So my question is what is the proper way to use all of this together? Should I build my own gce image with the latest Flink? Is there one already existing?
As you already said flink-connector-gcp-pubusub
is only available from Flink 1.9.0. So you have two options:
I would not recommend implementing connector as it is a complex task and requires an in-depth understanding of Flink while building your own image should be relatively easy given an example for Flink 1.6.4
I solved this problem by running Flink 1.9.0 in Kubernetes. This way I do not depend on anybody and can run whatever version I need.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With