I am testing the retry logic of the step function. Theoretically following step function should have been retried to execute the lambda 3 times if it fails.
{
"StartAt": "Bazinga",
"States": {
"Bazinga": {
"Type": "Task",
"Resource": "arn:aws:states:::lambda:invoke",
"Parameters": {
"FunctionName": "arn:aws:lambda:ap-southeast-2:518815385770:function:errorTest:$LATEST",
"Payload": {
"Input.$": "$"
}
},
"Retry" : [
{
"ErrorEquals": [ "States.All", "States.Timeout" ],
"IntervalSeconds": 1,
"MaxAttempts": 3,
"BackoffRate": 1.0
}
],
"Next": "Fail"
},
"Fail": {
"Type": "Fail"
}
}
}
The lambda it calls is set to timeout in 3 seconds. The lambda freezes for 4 seconds. This means the lambda times out and throws States.Timeout
error. The code is given below:
function sleep(ms){
return new Promise(resolve=>{
setTimeout(resolve,ms)
})
}
exports.handler = async (event) => {
console.log('------------> executing ....')
await sleep(4000)
};
The problem is step function does not retry the task. This can be confirmed from the following CloudWatch
logs.
05:59:36
START RequestId: dd1a2ee9-f389-44be-aaa6-07f2ca7983b0 Version: $LATEST
05:59:36
2019-07-24T05:59:36.340Z dd1a2ee9-f389-44be-aaa6-07f2ca7983b0 INFO ------------> executing ....
05:59:39
END RequestId: dd1a2ee9-f389-44be-aaa6-07f2ca7983b0
05:59:39
REPORT RequestId: dd1a2ee9-f389-44be-aaa6-07f2ca7983b0 Duration: 3003.29 ms Billed Duration: 3000 ms Memory Size: 128 MB Max Memory Used: 26 MB
05:59:39
2019-07-24T05:59:39.317Z dd1a2ee9-f389-44be-aaa6-07f2ca7983b0 Task timed out after 3.00 seconds
Not sure what went wrong. Any help is appreciated, thanks in advance.
Handling a failure using Retry The function is retried twice with an exponential backoff between retries. This variant uses the predefined error code States. TaskFailed , which matches any error that a Lambda function outputs.
BackoffRate – the value of this key signifies the multiplier by which the retry interval (IntervalSeconds) increases after each retry attempt. For example, the first retry attempt will wait 3 seconds, and the second retry attempt will wait 4.5 seconds.
When a function returns an error after execution, Lambda attempts to run it two more times by default. With Maximum Retry Attempts, you can customize the maximum number of retries from 0 to 2. This gives you the option to continue processing new events with fewer or no retries.
To stop it from retrying, you should make sure that any error is handled and you tell Lambda that your invocation finished successfully by returning a non-error (or in Node, calling callback(null, <any>) . To do this you can enclose the entire body of your handler function with a try-catch.
To answer my own question, there were 2 problems with the retry logic I placed.
States.All
should have been States.ALL
(notice the case of L)Lambda.Unknown
instead of States.Timeout
.I updated my step function with following code and now it works:
{
"StartAt": "Bazinga",
"States": {
"Bazinga": {
"Type": "Task",
"Resource": "arn:aws:states:::lambda:invoke",
"Parameters": {
"FunctionName": "arn:aws:lambda:ap-southeast-2:518815385770:function:errorTest:$LATEST",
"Payload": {
"Input.$": "$"
}
},
"Retry" : [
{
"ErrorEquals": [ "States.Timeout", "Lambda.Unknown" ],
"IntervalSeconds": 1,
"MaxAttempts": 3,
"BackoffRate": 1.0
}
],
"Next": "Fail"
},
"Fail": {
"Type": "Fail"
}
}
}
Because you didn't define TimeoutSeconds
in the ASL. Example:
"Type": "Task",
"Resource": "${FunctionArn}",
"TimeoutSeconds": 3,
Otherwise it will throw out Lambda.Unknown
failure
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With