Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

GCP Dataflow vs Cloud Functions [duplicate]

I've a existing system where data is published to Pub/Sub topic, read by a cloud functions subscriber & pushed to Big Query to store it (No additional transformation done in subscriber CF).

Is it a good idea to change my subscriber CF to a Dataflow streaming job using pub/sub-BQ template? What are the pros/cons of using them?

like image 469
Darshan Naik Avatar asked Mar 03 '23 05:03

Darshan Naik


1 Answers

All depends of your use case and your data rate.

  • In case of sparse data published to PubSub topic, Cloud Function work well and cost almost nothing
  • In case of sustainable traffic, you have to take care of your processing time. A simple dataflow will cost only 1vm up (basic vm, n1-standard-1). Cloud Functions hour price is more expensive than 1vm up (n1-standard-1). In case of concurrent message, several instances will be spawn, and this increase the processing cost.

You also have to take into account the easiness of deployment of a function (at the opposite of Dataflow where you have to drain your pipeline, stop it and relaunch it) and the capability to do much more (and over a longer period of time) with Dataflow (you are limited in processing capability with function, and the processing duration of each message can't go above 9 minutes).

According with your project perspective, one solution or the other can be better.

In bonus, I have a third alternative: Cloud Run. Cloud Run is almost as easy as function do update and deploy, the processing duration is a little bit longer (15 minutes per message) and you can process several message on the same instance, and thus, the pricing can be far more interesting than with function because of this factorization. If you are interested, have a look on this article that I wrote

like image 152
guillaume blaquiere Avatar answered May 13 '23 19:05

guillaume blaquiere