AWS Lambda execution duration randomly spikes and causes time-outs

Tags:

I'm building a server-less web-tracking system which serves its tracking pixel using AWS API Gateway, which calls a Lambda function whenever a tracking request arrives to write the tracking event into a Kinesis stream.

The Lambda function itself does not do anything fancy. It just a takes the incoming event (its own argument) and writes it to the stream. Essentially, it's just:

import boto3
kinesis_client = boto3.client("kinesis")

kinesis_stream = "my_stream_name"

def return_tracking_pixel(event, context):
    ...
    new_record = ...(event)
    kinesis_client.put_record(
        StreamName=kinesis_stream,
        Data=new_record,
        PartitionKey=...
    )
    return ...

Sometimes I experience a weird spike in the Lambda execution duration that causes some of my Lambda function invocations to time-out and the tracking requests to be lost.

This is the graph of 1-minute invocation counts of the Lambda function in the in affected time period:

enter image description here

Between 20:50 and 23:10 I suddenly see many invocation errors (1-minute error counts):

enter image description here

which are obviously caused by the Lambda execution time-out (maximum duration in 1-minute intervals):

enter image description here

There is nothing weird going on neither with my Kinesis stream (data-in, number of put records, put_record success count etc., all looks normal), nor with my API GW (number of invocations corresponds to number of API GW calls, well within the limits of the API GW).

What could be causing the sudden (and seemingly randomly occurring) spike in the Lambda function execution duration?

EDIT: neither the lambda functions are being throttled, which was my first idea.

773

asked Jan 18 '17 10:01

grepe

1 Answers

Just to add my 2 cents, because there's not much investigative work without extra logging or some X-Ray analysis.

AWS Lambda sometimes will force recycle containers which will feel like cold starts even though your function is being reasonably exercised and warmed up. This might bring all cold start related issues, like extra delays for ENIs if your Lambda has an attached VPC and so on... but even for a simple function like yours, 1 second timeout is sometimes too optimistic for a cold start.

I don't know of any documentation on those forced recycles, other than some people having evidence for it.

"We see a forced recycle about 7 times a day." source

"It also appears that even once warmed, high concurrency functions get recycled much faster than those with just a few in memory." source

I wonder how you could confirm this is the case. Perhaps you could check those errors appearing in Cloud Watch log streams to be from containers that never appeared before.

101

answered Oct 25 '22 22:10

villasv

Related questions
                            
                                Using aws profiles in python2 boto
                            
                                Where is the AWS Tools for Windows PowerShell Source Code
                            
                                Latest Chrome (46.0.2490.7) fails to load fonts over https CORS from S3 bucket
                            
                                InvalidCiphertextException when calling kms.decrypt with S3 metadata
                            
                                JsonMappingException when run as junit at Eclipse Amazon lambda function
                            
                                Aws Elasticbean doesn't run my .jar application
                            
                                AWS SNS push notification clarification
                            
                                AWS API Gateway Model : Invalid model schema specified
                            
                                InvalidInput error when trying to Create or Upsert a Route53 A record
                            
                                AWS CloudFormation: What is causing resources creation to be cancelled and how to debug?
                            
                                Where to get Buyer info such as buyer email, name etc for pending order or cancel orders in Amazon mws
                            
                                Sign AWS requests using Cognito Your User Pool user using Postman
                            
                                aws s3 putObject vs sync
                            
                                AWS S3 Java Embedded Mock for Integration Tests
                            
                                How to get a notification after AWS Cloudfront invalidation is completed?
                            
                                Connection between two pods located in independent Kubernetes clusters
                            
                                Elastic Beanstalk stripping Sec-WebSocket-Accept header
                            
                                Accessing private s3 bucket files
                            
                                AWS Cognito adminCreateUser from Lambda
                            
                                Looking for API Gateway Technology that call multiple microservices [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

AWS Lambda execution duration randomly spikes and causes time-outs

Tags:

amazon-web-services

aws-lambda

aws-api-gateway

amazon-kinesis

grepe

People also ask

1 Answers

villasv

Recent Activity

Donate For Us