Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fast data access for AWS Lambda function

I have a python-based lambda function which triggers on s3 put operations based on a kinesis firehose stream which sends data at the rate of around 10k records per minute. Right now the lambda function just performs some small fixups of the data and delivers it to a logstash instance in batches of 100. The lambda execution time is 5-12 secs which is fine as it runs every minute.

We're looking at enriching the streamed data with some more info before sending it to logstash. Each message coming in has an "id" field, and we'd like to lookup that id against a db of some sort, grab some extra info from the db and inject that into the object before passing it on.

Problem is, I cannot make it go fast enough. I tried loading all the data (600k records) into DynamoDB, and perform lookups on each record loop in the lambda function. This slows down the execution way too much. Then I figured we don't have to lookup the same id twice, so i'm using a list obj to hold already "looked-up" data - this brought the execution time down somewhat, but still not nearly close to what we'd like.

Then I thought about preloading the entire DB dataset. I tested this - simply dumping all 600 records from dynamodb into a "cache list" object before starting to loop thru each record from the s3 object. The data dumps in about one minute, but the cache list is now so large that each lookup against it takes 5 secs (way slower than hitting the db).

I'm at a loss on what do do here - I totally realize that lambda might not be the right platform for this and we'll probably move to some other product if we can't make it work, but first I thought I'd see if the community had some pointers as to how to speed up this thing.

like image 946
Trondh Avatar asked Feb 25 '26 22:02

Trondh


1 Answers

Preload the data into a Redis server. This is exactly what Redis is good at.

like image 124
Mark B Avatar answered Feb 27 '26 12:02

Mark B