When the message has been added to an SQS queue and it is configured to trigger a lambda function (nodejs).
When a lambda function is triggered - I may want to retry same message again after 5 minute without deleting the message from the Queue. The reason I want to do this if Lambda could not connect external host (eg: API) - i like to try again after 5 minutes for 3 attempts only.
How can that be written in node js?
For example in Laravel, we can Specifying Max Job Attempts
functionality. The number of times the job may be attempted using public $tries = 5;
Source: https://laravel.com/docs/5.7/queues#max-job-attempts-and-timeout
How can we do similar fashion in node.js?
I am thinking adding a message to another queue (for retry). A lambda function read all the messages from that queue after 5 minutes and send that message back to main Queue and it will be trigger a lambda function.
You can have maximum of 2 retries with lambda for asynchronous invocations and up to 1000 SQS retries.
If a Lambda function throws an error, the Lambda service continues to process the failed message until: The message is processed without any error from the function, and the service deletes the message from the queue. The Message retention period is reached and SQS deletes the message from the queue.
Maximum Retry AttemptsWhen a function returns an error after execution, Lambda attempts to run it two more times by default. With Maximum Retry Attempts, you can customize the maximum number of retries from 0 to 2. This gives you the option to continue processing new events with fewer or no retries.
With a state machine, you can exit the Lambda when you are running low on time and return information about where you are in the process. That information can then be passed into the Lambda in its next execution, looping until you return something that indicates you are complete.
Re-tries and re-tries "timeout" can all be configured directly in the SQS queue.
When you create a queue, set up the following attributes:
The Default Visibility Timeout will be the time that the message will be hidden once it has been received by your application. If the message fails during the lambda run and an exception is thrown, lambda will not delete any of the messages in the batch and all of them will eventually re-appear in the queue.
If you only want to try 3 times, you must set the SQS re-drive policy (AKA Dead Letter Queue)
The re-drive policy will enable your queue to redirect messages to a Dead Letter Queue (DLQ) after the message has re-appeared in the queue N
number of times, where N
is a number between 1 and 1000.
It is essential to understand that lambda will continue to process a failed message (a message that generates an exception in the code) until:
Message Retention Period
expires (SQS deletes the message)Lambda will not dispose of this bad message otherwise.
Based on several experiments I ran to understand the behavior of the SQS integration (the documentation on re-tries can be ambiguous).
Lambda will not delete failed messages and will continue to re-try them. Even if you have a Lambda DLQ setup, failed messages will not be sent to the lambda DLQ. Lambda fully relies on the configuration of the SQS queue for this purpose as stated in the lambda DLQ documentation.
Recommendation:
As I stated earlier if there is an exception in your code while processing a message, the whole batch of messages is re-tried, it doesn't matter if some of the messages were processed correctly. If for some reason a downstream service is failing you may end up with messages that were processed in the DLQ.
Recommendation:
The blog post "Lambda Concurrency Limits and SQS Triggers Don’t Mix Well (Sometimes)" describes how, if your concurrency limit is set too low, lambda may cause batches of messages to be throttled and the received attempt to be incremented without ever being processed.
Recommendation:
The post and Amazon's recommendations are:
- Set the queue’s visibility timeout to at least 6 times the timeout that you configure on your function.
- The extra time allows for Lambda to retry if your function execution is throttled while your function is processing a previous batch.
- Set the maxReceiveCount on the queue’s re-drive policy to at least 5. This will help avoid sending messages to the dead-letter queue due to throttling.
- Configure the dead-letter to retain failed messages long enough so that you can move them back later to be reprocessed
Here is how I did it.
(Q1/Q2) SQS Trigger --> Lambda L1 (if failed, delete on (Q1/Q2), drop it on Q2) --> On Failure DLQ
When messages arrive on Q1 it triggers Lambda L1 if success goes from there. If fails, drop it to Q2 (which is a delayed queue). Every message that arrives on Q2 will have a delay of 5 minutes.
If your initial message can have a delay of 5 mins, then you might not need two queues. One queue should be good. If the initial delay is not acceptable then you need two queues. One another reason to have two queues, you will always have a way for new messages that comes in the path.
If you have a code failure in handling Q1/Q2 aws infrastructure will retry immediately for 3 times before it sends it to DLQ1. If you handle the error in the code, then you can get the pipeline to work with the timings you mentioned.
SQS Delay Queues:
https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-delay-queues.html
SQS Lambda Architecture:
https://nordcloud.com/amazon-sqs-as-a-lambda-event-source/
Hope it helps.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With