Processing rather big text files on serverless AWS

Question

I'm trying to figure out an architecture for processing rather big files (maybe few hundred MB) on a serverless AWS. This is what I've got so far:

API Gateway -> S3 -> Lambda function -> SNS -> Lambda function

In this scenario, the text file is uploaded to S3 through API Gateway. Then some Lambda function is called based on the event generated on S3. This Lambda function will open the text file and read it line by line, generating tasks to be done as messages in an SNS topic. Each message will invoke a separate Lambda function process the task.

My only concern is the first Lambda function call. What if it times out? How can I make sure that it's not a point of failure?

spg · Accepted Answer

You can ask S3 to only return a particular byte range of a given object, using the Range header: https://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectGET.html

for example:

Range: bytes=0-9

would return only the first 10 bytes of the S3 object.

To read a file line by line, you would have to decide on a specific chunk size (1 MB for example), read 1 chunk of the file at a time and split the chunk by line (by looking for newline characters). Once the whole chunk has been read, you could re-invoke the lambda and pass the chunk pointer as a parameter. The new invocation of the lambda will read the file from the chunk pointer given as a parameter.

Processing rather big text files on serverless AWS

Tags:

amazon-web-services

aws-lambda

Mehran

1 Answers

spg

Recent Activity

Donate For Us

Processing rather big text files on serverless AWS

Tags:

amazon-web-services

aws-lambda

Mehran

1 Answers

spg

Related questions

Recent Activity

Donate For Us