We seem to be deterministically encountering this problem and aren't sure where we're misconfigured. For lambdas running less than ~5 minutes, our invocation succesfully wraps up ~0.5 seconds after the lambda completes. However for anything running longer than that, we can see that the lambda completes in the lambda logs, but our client invocation throws a ClientExecutionTimeoutException
after 15 minutes.
After encountering the problem with other (otherwise successful) lambdas, we created a basic test lambda on Node with a sleep function and have been able to deterministically reproduce the issue:
function sleep(s) {
return new Promise(resolve => setTimeout(resolve, s * 1000));
}
const sleepMinutes = 60 * 5;
exports.handler = async (event) => {
console.log(`received lambda invocation, sleeping ${sleepMinutes}`);
const response = {
statusCode: 200,
body: JSON.stringify(`finished running, slept for ${sleepMinutes} minutes`),
};
await sleep(sleepMinutes);
console.log('finished sleeping');
return response;
};
Our lambda invocation client is using these client configs:
clientConfig.setRetryPolicy(PredefinedRetryPolicies.NO_RETRY_POLICY);
clientConfig.setMaxErrorRetry(0);
clientConfig.setSocketTimeout(15 * 60 * 1000);
clientConfig.setRequestTimeout(15 * 60 * 1000);
clientConfig.setClientExecutionTimeout(15 * 60 * 1000);
Is there a ~5 minute timeout config we're missing?
Invocation errors can be caused by issues with request parameters, event structure, function settings, user permissions, resource permissions, or limits. If you invoke your function directly, you see any invocation errors in the response from Lambda.
To troubleshoot the retry and timeout issues, first review the logs of the API call to find the problem. Then, change the retry count and timeout settings of the AWS SDK as needed for each use case. To allow enough time for a response to the API call, add time to the Lambda function timeout setting.
The default timeout of a Lambda function is three seconds. This means, if you don't explicitly configure a timeout, your function invocations will be suspended after three seconds. Now, if you call a few services, some of which are currently at capacity, a request can very well take a second on its own.
Finding the root cause of the timeout. There are many reasons why a function might time out, but the most likely is that it was waiting on an IO operation to complete. Maybe it was waiting on another service (such as DynamoDB or Stripe) to respond.
Javadocs in aws-sdk-java says:
For functions with a long timeout, your client might be disconnected during synchronous invocation while it waits for a response. Configure your HTTP client, SDK, firewall, proxy, or operating system to allow for long connections with timeout or keep-alive settings.
On the other hand, previously AWS Lambda was limited up to 5 minutes, later this limit was increased up to 15 minutes.
I would check:
AWSLambdaAsyncClient.invokeAsync()
for long running invocations. If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With