Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Throttling Azure Storage Queue processing in Azure Function App

I have created an Azure Function app with an Azure Storage Queue trigger that processes a queue in which each queue item is a URL. The Function just downloads the content of the URL. I have another function that loads and parses a site's XML Sitemap and adds all the page URLs to the queue. The problem I have is that the Functions app runs too quickly and it hammers the website so it starts returning Server Errors. Is there a way to limit/throttle the speed at which the Functions app runs?

I could, of course, write a simple web job that processed them serially (or with some async but limit the number of concurrent requests), but I really like the simplicity of Azure Functions and wanted to try out "serverless" computing.

like image 423
Alex Avatar asked Oct 17 '16 19:10

Alex


People also ask

How fast is azure storage queue?

Both from local and a WorkerRole inside the same data center, the inserts max out at 5 per second, and average 4.73 per second.

What is the difference between azure storage queue and service bus queue?

Storage queues provide a uniform and consistent programming model across queues, tables, and BLOBs – both for developers and for operations teams. Service Bus queues provide support for local transactions in the context of a single queue.

What is the service function of azure queue storage?

Azure Queue storage is a service for storing large numbers of messages that can be accessed from anywhere in the world via authenticated calls using HTTP or HTTPS. A single queue message can be up to 64 KB in size, and a queue can contain millions of messages, up to the total capacity limit of a storage account.


1 Answers

There are a few options you can consider.

First, there are some knobs that you can configure in host.json that control queue processing (documented here). The queues.batchSize knob is how many queue messages are fetched at a time. If set to 1, the runtime would fetch 1 message at a time, and only fetch the next when processing for that message is complete. This could give you some level of serialization on a single instance.

Another option might be for you to set the NextVisibleTime on the messages you enqueue in such a way that they are spaced out - by default messages that are enqueued become visible and ready for processing immediately.

A final option might be be for you to enqueue a message with the collection of all URLs for a site, rather than one at a time, so when the message is processed, you can process the URLs serially in your function, and limit the parallelism that way.

like image 109
mathewc Avatar answered Oct 08 '22 05:10

mathewc