The AWS SQS -> Lambda integration allows you to process incoming messages in a batch, where you configure the maximum number you can receive in a single batch. If you throw an exception during processing, to indicate failure, all the messages are not deleted from the incoming queue and can be picked up by another lambda for processing once the visibility timeout has passed.
Is there any way to keep the batch processing, for performance reasons, but allow some messages from the batch to succeed (and be deleted from the inbound queue) and only leave some of the batch un-deleted?
If a Lambda function throws an error, the Lambda service continues to process the failed message until: The message is processed without any error from the function, and the service deletes the message from the queue. The Message retention period is reached and SQS deletes the message from the queue.
A Lambda function can process items from multiple queues (using one Lambda event source for each queue). You can use the same queue with multiple Lambda functions.
Batch size – The number of records to send to the function in each batch. For a standard queue, this can be up to 10,000 records. For a FIFO queue, the maximum is 10.
When using SQS as a Lambda event source mapping, Lambda functions can be triggered with a batch of messages from SQS. If your function fails to process any message from the batch, the entire batch returns to your SQS queue, and your Lambda function will be triggered with the same batch again.
The problem with manually re-enqueueing the failed messages to the queue is that you can get into an infinite loop where those items perpetually fail and get re-enqueued and fail again. Since they are being resent to the queue their retry count gets reset every time which means they'll never fail out into a dead letter queue. You also lose the benefits of the visibility timeout. This is also bad for monitoring purposes since you'll never be able to know if you're in a bad state unless you go manually check your logs.
A better approach would be to manually delete the successful items and then throw an exception to fail the rest of the batch. The successful items will be removed from the queue, all the items that actually failed will hit their normal visibility timeout
periods and retain their receive count
values, and you'll be able to actually use and monitor a dead letter queue. This is also overall less work than the other approach.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With