I’m trying to understand how scaling works using azure functions. We’ve been testing with an app that generates 88 messages in a storage queue, which triggers our function. The function is written in c#. The function downloads a file, performs some processing on it (it will eventually post it back but we aren’t doing that yet for testing purposes). The function takes about 30 seconds to complete per request (total ~2500 seconds of processing). For testing purposes we loop this 10 times.
Our ideal situation would be that after some warming, Azure would automatically scale the function up to handle the messages in the most expedient way. Using some sort of algorithm taking into account spin up time, etc.. Or just scale up to the number of messages in the backlog, with some sort of a cap.
Is this how it is supposed to work? We have never been able to get over 7 ‘consumption units’. And generally take about 45 minutes to process the queue of messages.
Couple of other question re scalability… Our function is a memory intensive operation, how is memory ‘shared’ across scaled instances of a function? I ask because we are seeing some out of memory errors, that we don’t normally see. We’ve configure for the max memory for the function (1536MB). Seeing about 2.5% of the operations failing from an out of memory error
Thanks in advance, we’re really looking to make this work as it would allow us to move a lot of our work off of dedicated windows VMs on EC2 and onto Azure functions.
Scale out automatically, even during periods of high load. Azure Functions infrastructure scales CPU and memory resources by adding additional instances of the Functions host, based on the number of incoming trigger events. Event driven. Scale out automatically, even during periods of high load.
Parallel execution Each instance of the function app, whether the app runs on the Consumption hosting plan or a regular App Service hosting plan, might process concurrent function invocations in parallel using multiple threads.
3. App Service Plan: With this plan, virtual machines are always running, so you never have to worry about cold starts. This is ideal for long-running operations, or when more predictive scaling and costs are required.
The intent is that the platform takes care of automatically scaling for you with the ultimate goal that you don't have to think or care about the number of "consumption units" (sometimes referred to as instances) that are assigned to your function app. That said, there will always be room for improvement to ensure we get this right for the majority of users. :)
But to answer your question about the internal details (as far as queue processing goes), what we have in place right now is a system which examines the queue length and the amount of time each message sits in the queue before being processed by your app. If we feel like your function app is "falling behind" in processing these messages, then more consumption units will be added until we think your app is able to keep up with the incoming load.
One thing that's very important to mention is that there is another aspect of scale besides just the number of consumption units. Each consumption unit has the ability to process many messages in parallel. Often times we see that the problem people have is not the number of allocated consumption units, but the default concurrency configuration for their workload. Take a look at the batchSize and newBatchThreshold settings which can be tweaked in your host.json file. Depending on your workload, you may find that you get significantly better throughput when you change these values (in some cases, reducing concurrency has been shown to dramatically increase throughput). For example, you may observe this if each function execution requires a lot of memory or if your functions depend on an external resource (like a database) which can only handle limited concurrent access. More documentation on these concurrency controls can be found here: https://github.com/Azure/azure-webjobs-sdk-script/wiki/host.json.
As I hinted at above, playing with per-consumption unit concurrency may help with the memory pressure issues you've been encountering. Each consumption unit has its own pool of memory (e.g. its own 1.5 GB). But if you're processing too many messages in a single consumption unit, then that could be the source of the out-of-memory errors you're seeing.
With all this said, we are constantly doing work to identify and optimize certain load scenarios which we think are the most common, whether it's draining a pile of messages from a queue, consuming a "stream" of blobs in a storage container, processing a flood of HTTP requests, etc. Expect things to change as we learn, mature, and get more feedback from folks like yourself. The best place to provide such feedback to the product group is in our GitHub repo's issue list, which is reviewed regularly.
Thanks for the question. I hope this information was helpful and that you're able to get the numbers you're looking for.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With