We use Lambda to power APIs (via API Gateway) accessed via news media websites, receiving a fluctuating but high load of traffic. We began experiencing throttles, so we raised our concurrency limit to 2000. However, we still experience throttles multiple times per day.
Oddly in CloudWatch metrics, the concurrent requests peak at around 600 or lower when we're throttled. See this CloudWatch chart as an example:
Has anyone experienced this before? Why do you think this is happening? What can we do about it?
More Information
Additionally, here's an image that also shows total invocation count and average duration over the same time period. It's hard to know what's causal (duration up because of throttling, or vice versa, because some of the lambdas do call other lambdas). Please see the appropriate axis because the scales are quite different.
When the burst concurrency limit is reached, the function starts to scale linearly. If this isn't enough concurrency to serve all requests, additional requests are throttled and should be retried. The function continues to scale until the account's concurrency limit for the function's Region is reached.
To increase your Lambda function's concurrency limit, you must open a quota increase case in the Service Quotas dashboard. For more information, see Lambda function scaling and Managing concurrency for a Lambda function. Important: Increasing your concurrency limit can add cost to your AWS account.
At the highest level, throttling just means that Lambda will intentionally reject one of your requests and so what we see from the user side is that when making a client call, Lambda will throw a throttling exception, which you need to handle. Typically, people handle this by backing off for some time and retrying.
I think this has to do with Lambda concurrency burst limits.
Basically, there's a limit on how many instances of your Lambda function you can run concurrently under sudden load and this limit is different to the overall per-region Lambda concurrency limit.
You can find more information about it here:
https://docs.aws.amazon.com/lambda/latest/dg/scaling.html
The relevant part:
AWS Lambda dynamically scales function execution in response to increased traffic, up to your concurrency limit. Under sustained load, your function's concurrency bursts to an initial level between 500 and 3000 concurrent executions that varies per region. After the initial burst, the function's capacity increases by an additional 500 concurrent executions each minute until either the load is accommodated, or the total concurrency of all functions in the region hits the limit.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With