How to extract files in S3 on the fly with boto3?

2 Answers

You can use BytesIO to stream the file from S3, run it through gzip, then pipe it back up to S3 using upload_fileobj to write the BytesIO.

# python imports import boto3 from io import BytesIO import gzip  # setup constants bucket = '<bucket_name>' gzipped_key = '<key_name.gz>' uncompressed_key = '<key_name>'  # initialize s3 client, this is dependent upon your aws config being done  s3 = boto3.client('s3', use_ssl=False)  # optional s3.upload_fileobj(                      # upload a new obj to s3     Fileobj=gzip.GzipFile(              # read in the output of gzip -d         None,                           # just return output as BytesIO         'rb',                           # read binary         fileobj=BytesIO(s3.get_object(Bucket=bucket, Key=gzipped_key)['Body'].read())),     Bucket=bucket,                      # target bucket, writing to     Key=uncompressed_key)               # target key, writing to

Ensure that your key is reading in correctly:

# read the body of the s3 key object into a string to ensure download s = s3.get_object(Bucket=bucket, Key=gzip_key)['Body'].read() print(len(s))  # check to ensure some data was returned

117

answered Sep 21 '22 05:09

Todd Jones

The above answers are for gzip files, for zip files, you may try

import boto3 import zipfile from io import BytesIO bucket = 'bucket1'  s3 = boto3.client('s3', use_ssl=False) Key_unzip = 'result_files/'  prefix      = "folder_name/" zipped_keys =  s3.list_objects_v2(Bucket=bucket, Prefix=prefix, Delimiter = "/") file_list = [] for key in zipped_keys['Contents']:     file_list.append(key['Key']) #This will give you list of files in the folder you mentioned as prefix s3_resource = boto3.resource('s3') #Now create zip object one by one, this below is for 1st file in file_list zip_obj = s3_resource.Object(bucket_name=bucket, key=file_list[0]) print (zip_obj) buffer = BytesIO(zip_obj.get()["Body"].read())  z = zipfile.ZipFile(buffer) for filename in z.namelist():     file_info = z.getinfo(filename)     s3_resource.meta.client.upload_fileobj(         z.open(filename),         Bucket=bucket,         Key='result_files/' + f'{filename}')

This will work for your zip file and your result unzipped data will be in result_files folder. Make sure to increase memory and time on AWS Lambda to maximum since some files are pretty large and needs time to write.

answered Sep 20 '22 05:09

Hari_pb

Related questions
                            
                                Apache Hadoop Yarn vs. Kubernetes
                            
                                Swashbuckle.AspNetCore SwaggerOperation attribute not found
                            
                                How to pass environment variable to the buildspec.yml for AWS codebuild
                            
                                Flutter Future<bool> vs bool type
                            
                                unauthorized_scope_error in LinkedIn oAuth2 authentication
                            
                                Unable to resolve service for type 'Microsoft.AspNetCore.Identity.RoleManager`
                            
                                How to Verify Email Without Asking the User to Login to Laravel
                            
                                IntelliJ/Gradle Could not determine java version from '11.0.1'
                            
                                NestJS Get current user in GraphQL resolver authenticated with JWT
                            
                                BrowserslistError: Unknown browser kaios
                            
                                How to enable optional chaining with Create React App and TypeScript
                            
                                Semi linear merge

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to extract files in S3 on the fly with boto3?

Tags:

The One

People also ask

2 Answers

Todd Jones

Hari_pb

Recent Activity

Donate For Us