AWS Lambda: How to extract a tgz file in a S3 bucket and put it in another S3 bucket

Question

I have an S3 bucket named "Source". Many '.tgz' files are being pushed into that bucket in real-time. I wrote an Java code for extracting the '.tgz' file and pushing it into "Destination" bucket. I pushed my code as Lambda function. I got the '.tgz' file as InputStream in my Java code. How to extract it in Lambda ? I'm not able to create a file in Lambda, it throws "FileNotFound(Permission Denied)" in JAVA.

AmazonS3 s3Client = new AmazonS3Client();
S3Object s3Object = s3Client.getObject(new GetObjectRequest(srcBucket, srcKey));
InputStream objectData = s3Object.getObjectContent();
File file = new File(s3Object.getKey());
OutputStream writer = new BufferedOutputStream(new FileOutputStream(file)); <--- It throws FileNotFound(Permission denied) here

Łukasz Wachowicz · Accepted Answer

Since one of the responses was in Python i provide alternative solution in this language.

Problem with the solution using /tmp file-system is, that AWS allows to store only 512 MB there (read more). In order to untar or unzip larger files it's better to use io package and BytesIO class and process file contents purely in memory. AWS allows to assign up to 3GB of RAM to a Lambda and this extends max file size significantly. I successfully tested untar'ing with 1GB S3 file.

In my case un-taring of ~2000 files from 1GB tar-file to another S3 bucket took 140 seconds. It can by further optimized by utilizing multiple threads for uploading un-tarred files to target S3 bucket.

Example code below present single-threaded solution:

import boto3
import botocore
import tarfile

from io import BytesIO
s3_client = boto3.client('s3')

def untar_s3_file(event, context):

    bucket = event['Records'][0]['s3']['bucket']['name']
    key = event['Records'][0]['s3']['object']['key']

    input_tar_file = s3_client.get_object(Bucket = bucket, Key = key)
    input_tar_content = input_tar_file['Body'].read()

    with tarfile.open(fileobj = BytesIO(input_tar_content)) as tar:
        for tar_resource in tar:
            if (tar_resource.isfile()):
                inner_file_bytes = tar.extractfile(tar_resource).read()
                s3_client.upload_fileobj(BytesIO(inner_file_bytes), Bucket = bucket, Key = tar_resource.name)

AWS Lambda: How to extract a tgz file in a S3 bucket and put it in another S3 bucket

Tags:

java

amazon-web-services

amazon-s3

aws-lambda

Avis

1 Answers

Łukasz Wachowicz

Recent Activity

Donate For Us

AWS Lambda: How to extract a tgz file in a S3 bucket and put it in another S3 bucket

Tags:

java

amazon-web-services

amazon-s3

aws-lambda

Avis

1 Answers

Łukasz Wachowicz

Related questions

Recent Activity

Donate For Us