Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the proper way to use Google Pub/Sub with Flink Streaming using Dataproc?

I'm trying to figure out the proper way to run Apache Flink on Dataproc and use Google Pub/Sub as a source/sink. When I create a Dataproc cluster, after applying flink initialization action to the most recent image 1.4, Flink 1.6.4 will be installed.

The problem is that flink-connector-gcp-pubsub is only available starting from Flink version 1.9.0.

So my question is what is the proper way to use all of this together? Should I build my own gce image with the latest Flink? Is there one already existing?

like image 332
Viktor Ershov Avatar asked Mar 04 '23 10:03

Viktor Ershov


2 Answers

As you already said flink-connector-gcp-pubusub is only available from Flink 1.9.0. So you have two options:

  • Either implement connector yourself
  • Build your own image based on the flink initialization actions

I would not recommend implementing connector as it is a complex task and requires an in-depth understanding of Flink while building your own image should be relatively easy given an example for Flink 1.6.4

like image 112
Gio Gogiashvili Avatar answered Mar 05 '23 23:03

Gio Gogiashvili


I solved this problem by running Flink 1.9.0 in Kubernetes. This way I do not depend on anybody and can run whatever version I need.

like image 38
Viktor Ershov Avatar answered Mar 06 '23 00:03

Viktor Ershov