Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use Google Cloud PubSub and Run to handle resource-intensive long-running tasks?

I've got a Google Cloud PubSub topic which at times has thousands of messages and at times zero messages coming in. These messages represent tasks which can take upwards of an hour each. Preferably I'm able to use Cloud Run for this, as it scales really well to the demand, if a thousand messages gets published, I want 100s of Cloud Run instances to spin up. These Run instances get started by a push subscription. The problem is that PubSub has a 600 second timeout for the acknowledgement. This means in order to have Cloud Run process these messages they have to finish within 600 seconds. If they do not, PubSub times it out, and sends it again, causing the task to be restarted until the first task finally does acknowledge it (this causes the same task to be ran many times). Cloud Run acknowledges the messages by returning a 2** HTTP status code. The documentation states

When an application running on Cloud Run finishes handling a request, the container instance's access to CPU will be disabled or severely limited. Therefore, you should not start background threads or routines that run outside the scope of the request handlers.

So is it maybe possible to acknowledge a PubSub request through code and continue the processing, without having Google Cloud Run hand over the resources? Or is there a better solution I'm unaware of?

Because these processes are so code/resource-intensive, I feel Cloud Functions will not suffice. I've looked at https://cloud.google.com/solutions/using-cloud-pub-sub-long-running-tasks and https://cloud.google.com/blog/products/gcp/how-google-cloud-pubsub-supports-long-running-workloads. But these didn't answer my question. I've looked at Google Cloud Tasks, which might be something? But the rest of the project has been built around PubSub/Run/Functions, so preferably I stick with that.

This project is written in Python. So preferably I would like to write my Google Cloud Run tasks like this:

@app.route('/', methods=['POST'])
def index():
    """Endpoint for Google Cloud PubSub messages"""
    pubsub_message = request.get_json()
    logger.info(f'Received PubSub pubsub_message {pubsub_message}')
    if message_incorrect(pubsub_message):
        return "Invalid request", 400 #use normal NACK handling
    # acknowledge message here without returning

    # ...
    # Do actual processing of the task here
    # ...

So how can or should I solve this, so that the the resource-intensive tasks get properly scaled on demand ( so a push PubSub subscription ). And the tasks only get executed once.

Answers: In short what has been answered. Cloud Run and Functions are just not suited for this problem. There is no way to have them do tasks that take longer than 9 or 15 minutes respectively. The only solution is to switch over to another Google Service and use a pull style subscription and lose out on auto-scaling of GC Run/Functions

like image 535
thimo visser Avatar asked Aug 05 '19 19:08

thimo visser


People also ask

Can Pubsub trigger cloud run?

Leveraging service accounts and IAM permissions, you can securely and privately use Pub/Sub with Cloud Run, without having to expose your Cloud Run service publicly. Only the Pub/Sub subscription that you have set up is able to invoke your service.

What is the maximum retention duration for a message in pub sub?

The default message retention duration is 7 days and the default expiration period is 31 days. To create a subscription with retention of acked messages enabled, follow these steps: In the console, go to the Pub/Sub subscriptions page.

Is Google Pubsub push or pull?

Push subscription The Pub/Sub server sends each message as an HTTPS request to the subscriber client at a pre-configured endpoint. This request is shown as a PushRequest in the image. The endpoint acknowledges the message by returning an HTTP success status code.

Does Google Pubsub use gRPC?

Pub/Sub has both traditional REST/HTTP and gRPC interfaces.


2 Answers

Neither Cloud Functions nor Cloud Run is sufficient for arbitrarily long running operations. Cloud Functions has a hard cap of 9 minutes per invocation, and Cloud Run caps at 60. If you need more time, you're going to have to delegate the work to another product, such as Google Compute Engine. It should be possible to kick off some Compute Engine work from one of the serverless products.

Give the limits of pubsub acks, you'll probably have to find a way for a client to be able to poll or listen to some resource to find out when the work is actually done. You could use a database for that, and Cloud Firestore lets you listen to documents to find out when they change. So you could use that to track the status of your long-running work.

like image 51
Doug Stevenson Avatar answered Oct 05 '22 07:10

Doug Stevenson


Cloud Run on GKE can handle long process, more CPU and memory than available on managed platform. However, you have a GKE cluster always running and you loose the "pay-as-you-use" benefit.

If you want to use this solution, don't link directly PubSub push subscription to your Cloud Run on GKE. Use Cloud Task with HTTP job for this. The timeout is longer than PubSub (up to 24h instead of 10 min) and the retry policies are customizables.

like image 36
guillaume blaquiere Avatar answered Oct 05 '22 08:10

guillaume blaquiere