I use aws api gateway integrated with aws lambda(java), but I'm seeing some serious problems in this approach. The concept of removing the server and having your app scaled out of the box is really nice but here are the problem I'm facing. My lambda is doing 2 simple things- validate the payload received from the client and then send it to a kinesis stream for further processing from another lambda(you will ask why I don't send directly to the stream and only use 1 lambda for all of the operations. Let's just say that I want to separate the logic and have a layer of abstraction and also be able to tell the client that he's sending invalid data.). In the implementation of the lambda I integrated the spring DI. So far so good. I started making performance testing. I simulated 50 concurrent users making 4 requests each with 5 seconds between the requests. So what happened- In the lambda's coldstart I initialize the spring's application context but it seems that having so many simultaneous requests when the lambda was not started is doing some strange things. Here's a screenshot of the times the context was initialized for. <img src="https://i.stack.imgur.com/k2Clu.png" alt="enter image description here"> What we can see from the screenshot is that the times for initializing the context have big difference. My assumption of what happening is that when so many requests are received and there's no "active" lambda it initializes a lambda container for every one of them and in the same time it "blocks" some of them(the ones with the big times of 18s) until the others already started are ready. So maybe it has some internal limit of the containers it can start at the same time. The problem is that if you don't have equally distributed traffic this will happen from time to time and some of the requests will timeout. We don't want this to happen. So next thing was to do some tests without spring container as my thought was "ok, the initialization is heavy, let's just make plain old java objects initialization". And unfortunatelly the same thing happened(maybe just reduced the 3s container initialization for some of the requests). Here is a more detailed screenshot of the test data: <img src="https://i.stack.imgur.com/MW2ct.png" alt="enter image description here"> So I logged the whole lambda execution time(from construction to the end), the kinesis client initialization and the actual sending of the data to the stream as these are the heaviest operations in the lambda. We still have these big times of 18s or something but the interesting thing is that the times are somehow proportional. So if the whole lambda takes 18s, around 7-8s is the client initialization and 6-7 for sending the data to the stream and 4-5 seconds left for the other operations in the lambda which for the moment is only validation. On the other hand if we take one of the small times(which means that it reuses an already started lambda),i.e. 820ms, it takes 100ms for the kinesis client initialization and 340 for the data sending and 400ms for the validation. So this pushes me again to the thoughts that internally it makes some sleeps because of some limits. The next screenshot is showing what is happening on the next round of requests when the lamda is already started: <img src="https://i.stack.imgur.com/UfxXD.png" alt="enter image description here"> So we don't have this big times, yes we still have some relatively big delta in some of the request(which for me is also strange), but the things looks much better. So I'm looking for a clarification from someone who knows actually what is happening under the hood, because this is not a good behavior for a serious application which is using the cloud because of it's "unlimited" possibilities. And another question is related to another limit of the lambda-200 concurrent invocations in all lambdas within an account in a region. For me this is also a big limitation for a big application with lots of traffic. So as my business case in the moment(I don't know for the future) is more or less fire and forget the request. And I'm starting to think of changing the logic in the way that the gateway sends the data directly to the stream and the other lambda is taking care of the validation and the further processing. Yes, I'm loosing the current abstraction(which I don't need at the moment) but I'm increasing the application availability many times. What do you think?

The lambda execution time spikes to 18s because AWS launches new containers w/ your code to handle the incoming requests. The bootstrap time is ~18s. Assigning more RAM can significantly improve the performance of your lambda function, because you have more RAM, CPU and networking throughput! <blockquote> And another question is related to another limit of the lambda-200 concurrent invocations in all lambdas within an account in a region. </blockquote> You can ask to the AWS Support to increase that limit. I asked to increase that limit to 10,000 invocation/second and the AWS Support did it quickly!

AWS Lambda Performance issues

Tags:

java

amazon-web-services

aws-lambda

aws-api-gateway

amazon-kinesis

I use aws api gateway integrated with aws lambda(java), but I'm seeing some serious problems in this approach. The concept of removing the server and having your app scaled out of the box is really nice but here are the problem I'm facing. My lambda is doing 2 simple things- validate the payload received from the client and then send it to a kinesis stream for further processing from another lambda(you will ask why I don't send directly to the stream and only use 1 lambda for all of the operations. Let's just say that I want to separate the logic and have a layer of abstraction and also be able to tell the client that he's sending invalid data.).

In the implementation of the lambda I integrated the spring DI. So far so good. I started making performance testing. I simulated 50 concurrent users making 4 requests each with 5 seconds between the requests. So what happened- In the lambda's coldstart I initialize the spring's application context but it seems that having so many simultaneous requests when the lambda was not started is doing some strange things. Here's a screenshot of the times the context was initialized for.

enter image description here

What we can see from the screenshot is that the times for initializing the context have big difference. My assumption of what happening is that when so many requests are received and there's no "active" lambda it initializes a lambda container for every one of them and in the same time it "blocks" some of them(the ones with the big times of 18s) until the others already started are ready. So maybe it has some internal limit of the containers it can start at the same time. The problem is that if you don't have equally distributed traffic this will happen from time to time and some of the requests will timeout. We don't want this to happen.

So next thing was to do some tests without spring container as my thought was "ok, the initialization is heavy, let's just make plain old java objects initialization". And unfortunatelly the same thing happened(maybe just reduced the 3s container initialization for some of the requests). Here is a more detailed screenshot of the test data:

enter image description here

So I logged the whole lambda execution time(from construction to the end), the kinesis client initialization and the actual sending of the data to the stream as these are the heaviest operations in the lambda. We still have these big times of 18s or something but the interesting thing is that the times are somehow proportional. So if the whole lambda takes 18s, around 7-8s is the client initialization and 6-7 for sending the data to the stream and 4-5 seconds left for the other operations in the lambda which for the moment is only validation. On the other hand if we take one of the small times(which means that it reuses an already started lambda),i.e. 820ms, it takes 100ms for the kinesis client initialization and 340 for the data sending and 400ms for the validation. So this pushes me again to the thoughts that internally it makes some sleeps because of some limits. The next screenshot is showing what is happening on the next round of requests when the lamda is already started:

enter image description here

So we don't have this big times, yes we still have some relatively big delta in some of the request(which for me is also strange), but the things looks much better.

So I'm looking for a clarification from someone who knows actually what is happening under the hood, because this is not a good behavior for a serious application which is using the cloud because of it's "unlimited" possibilities.

And another question is related to another limit of the lambda-200 concurrent invocations in all lambdas within an account in a region. For me this is also a big limitation for a big application with lots of traffic. So as my business case in the moment(I don't know for the future) is more or less fire and forget the request. And I'm starting to think of changing the logic in the way that the gateway sends the data directly to the stream and the other lambda is taking care of the validation and the further processing. Yes, I'm loosing the current abstraction(which I don't need at the moment) but I'm increasing the application availability many times. What do you think?

754

asked Mar 29 '17 09:03

Hristo Angelov

1 Answers

The lambda execution time spikes to 18s because AWS launches new containers w/ your code to handle the incoming requests. The bootstrap time is ~18s.

Assigning more RAM can significantly improve the performance of your lambda function, because you have more RAM, CPU and networking throughput!

And another question is related to another limit of the lambda-200 concurrent invocations in all lambdas within an account in a region.

You can ask to the AWS Support to increase that limit. I asked to increase that limit to 10,000 invocation/second and the AWS Support did it quickly!

100

answered Sep 19 '22 23:09

Antonio

Related questions
                            
                                String Concatenation - valueOf or not
                            
                                How to capitalize the first and last letters of every word in a string in java
                            
                                Why do Java collections not provide a convenient map method? [duplicate]
                            
                                JSchException timeout: socket is not established
                            
                                Spring Boot not failing if property is missing
                            
                                Should Guava Splitters/Joiners be created each time they are used?
                            
                                Dagger 2 Component.Builder
                            
                                Wait for an image to be fully loaded Selenium WebDriver
                            
                                Using @JmsListener with multiple payload types for the same destination?
                            
                                Do I need to synchronize ConcurrentMap when adding key only if needed?
                            
                                write Java List to JSON Array via Jackson
                            
                                Why should the parameter of an equals() method of a new class be of type Object?
                            
                                Write cyrillic chars into PDF form fields with PDFBox
                            
                                JFXDialog using JFoenix
                            
                                Filter RxJava Observable with other Observable
                            
                                Is this query possible using Criteria or DetachedCriteria Hibernate
                            
                                Primefaces Chips Web Component
                            
                                Group received messages in RabbitMQ, preferably using Spring AMQP?
                            
                                How to delete, not clear row in Sheet ? Apache POI
                            
                                How to fix test failing with "No ModelAndView found"?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

AWS Lambda Performance issues

Tags:

java

amazon-web-services

aws-lambda

aws-api-gateway

amazon-kinesis

Hristo Angelov

People also ask

1 Answers

Antonio

Recent Activity

Donate For Us