When the subscription queue is larger than the available instances, the expected behaviour is delivery will fail and be retried later.
What seems to be happening is the logs are filled with messages saying:
{
"textPayload": "The request was aborted because there was no available instance.",
"insertId": "6109fbbb0007ec4aaa3855a9",
...
}
And the subscription messages are just dropped and not retried.
Is this the expected behaviour? It seems crazy to me but if so, what architecture should you put in place to catch these dropped messages?
Edit: These issues started showing up in our logs on July 5 2021 and can't be found in logs before that date. Before that, the pubsub/gcf combo used to work as expected.
The error you are encountering is a known issue and the updates can be tracked through this Issue Tracker. You can also STAR
the issue to receive automatic updates and give it traction by referring to this link. The tracker also discusses work-arounds to mitigate the request aborts. Since you have already implemented retries with exponential backoff, please take a look at the other solutions provided here.
If your concern is to do with Google Cloud Functions scalability or in general require further investigation of these errors, please reach out to GCP support in case you have a support plan. Otherwise, please open an issue in the issue tracker.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With