Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using JPA/Hibernate in AWS lambda java

I must implement an AWS lambda function in Java that would consume a Kinesis stream and read/write data to a MySQL database. As I already have the model entities defined in another application, I would like to reuse them, and not work with plain SQL/JDBC. So my goal is to implement the lambda using JPA/Hibernate. Is this possible in general? If yes, are there any real examples or best practices? I have previously worked on Spring Boot applications, where similar functionality is perfectly available and easily configurable, and now I don't even know where to start from.

like image 775
Archie Avatar asked Jan 16 '18 16:01

Archie


1 Answers

I disagree strongly with your assessment that JPA isn't feasible on Lambda, and have several functions running currently which demonstrate the point, with a caveat: It's only really feasible to use JPA in a synchronous lambda environment, or an infrequently used one. (More on this in a moment)

Persistence.createPersistenceFactory should only be executed once per application, and the result placed in a static or singleton context.

You handle closing down JPA resources using a runtime shutdown hook.

What happens in practice is that the Lambda function is spun up and held around for a while even when no data is being processed. (This is true for both synchronous and asynchronous functions) Under this condition, when the next invocation comes, your function is executed with the same runtime environment, and it can re-use the EntityManagerFactory to create new EntityManager instances. The major difference between synchronous and asynchronous invocations is the amount of concurrency. Synchronous functions have very limited concurrency, whereas asynchronous functions can vary widely, so you bear a larger startup cost because it happens more often as the pool of function instances scales up and down.

Yes, the function's runtime environment will be periodically killed, whether synchronous or asynchronous, especially when no traffic is seen, and you then have the slow startup time, but subsequent invocations will re-use the same runtime environment(s) in my experience, especially when using synchronous functions, like those hooked into a Kinesis Stream.

I highly recommend limiting yourself to synchronous writers incidentally, as it will limit the number of Lambda instances, and hence how many JDBC pools are connecting to your database. (This can become a serious issue on the DB side, which generally only accepts a few dozen or hundred connections, but each JDBC pool will be ~10ish connections!!!) For Kinesis Streams, this is equal to the number of shards, and if your records are smallish, you can greatly increase the shard capacity using KPL and record aggregation. The lambda function itself will only receive < 6MB of data from the stream per invocation.

like image 81
SplinterReality Avatar answered Sep 19 '22 11:09

SplinterReality