Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

S3 Lambda trigger double invocation after exactly 10 minutes

We are experiencing double Lambda invocations of Lambdas triggered by S3 ObjectCreated-Events. Those double invocations happen exactly 10 minutes after the first invocation, not 10 minutes after the first try is complete, but 10 minutes after the first invocation happened. The original invocation takes anything in the range between 0.1 to 5 seconds. No invocations results in errors, they all complete successfully.

We are aware of the fact that SQS for example does not guarantee exactly-once but at-least-once delivery of messages and we would accept some of the lambdas getting invoked a second time due to results of the distributed system underneath. A delay of 10 minutes however sounds very weird.

Of about 10k messages 100-200 result in double invocations.

The AWS Support basically says "the 10 minute wait time is by design but we cannot tell you why", which is not at all helpful.


  • Has anyone else experienced this behaviour before?
  • How did you solve the issue or did you simply ignore it (which we could do)?
  • One proposed solution is not to use direct S3-lambda-triggers, but let S3 put its event on SNS and subscribe a Lambda to that. Any experience with that approach?

example log: two invocations, 10 minutes apart, same RequestId

START RequestId: f9b76436-1489-11e7-8586-33e40817cb02 Version: 13
2017-03-29 14:14:09 INFO ImageProcessingLambda:104 - handle 1 records

and

START RequestId: f9b76436-1489-11e7-8586-33e40817cb02 Version: 13
2017-03-29 14:24:09 INFO ImageProcessingLambda:104 - handle 1 records

like image 738
luk2302 Avatar asked May 04 '17 14:05

luk2302


People also ask

Why is my Lambda executing twice?

Make sure that your Lambda function's code is idempotent and capable of handling messages multiple times. Make sure that your Lambda function has its concurrency limit set high enough to handle the number of invocation requests it receives. Identify and resolve any errors that your Lambda function returns.

Which AWS security component allows Amazon S3 buckets to trigger AWS Lambda functions?

Many other services, such as AWS CloudTrail, can act as event sources simply by logging to Amazon S3 and using S3 bucket notifications to trigger AWS Lambda functions.

What is S3 trigger?

The trigger invokes your function every time that you add an object to your Amazon S3 bucket. We recommend that you complete this console-based tutorial before you try the tutorial to create thumbnail images.


2 Answers

After a couple of rounds with the AWS support and others and a few isolated trial runs it seems like this is simply "by design". It is not clear why, but it simply happens. The problem is neither S3 nor SQS / SNS but simply the lambda invocation and how the lambda service dispatches the invocations to lambda instances.

The double invocations happen somewhere between 1% and 3% of all invocations, 10 minutes after the first invocation. Surprisingly there are even triple (and probably quadruple) invocations with a rate of powers of the base probability, so basically 0.09%, ... The triple invocations happened 20 minutes after the first one.

If you encounter this, you simply have to work around it using whatever you have access to. We for example now store the already processed entities in a Cassandra with a TTL of 1 hour and only responding to messages from the lambda if the entity has not been processed yet. The double and triple invocations all happen within this one hour timeframe.

like image 62
luk2302 Avatar answered Nov 03 '22 03:11

luk2302


Not wanting to spin up a data store like Dynamo just to handle this, I did two things to solve our use case

  • Write a lock file per function into S3 (which we were already using for this one) and check for its existence on function entry, aborting if present; for this function we only ever want one of it running at a time. The lock file is removed before we call callback on error or success.
  • Write a request time in the initial event payload and check the request time on function entry; if the request time is too old then abort. We don't want Lambda retries on error unless they're done quickly, so this handles the case where a duplicate or retry is sent while another invocation of the same function is not already running (which would be stopped by the lock file) and also avoids the minimal overhead of the S3 requests for the lock file handling in this case.
like image 1
El Yobo Avatar answered Nov 03 '22 04:11

El Yobo