Reading contents of a gzip file from a AWS S3 in Python

Question

I am trying to read some logs from a Hadoop process that I run in AWS. The logs are stored in an S3 folder and have the following path.

bucketname = name key = y/z/stderr.gz Here Y is the cluster id and z is a folder name. Both of these act as folders(objects) in AWS. So the full path is like x/y/z/stderr.gz.

Now I want to unzip this .gz file and read the contents of the file. I don't want to download this file to my system wants to save contents in a python variable.

This is what I have tried till now.

bucket_name = "name" key = "y/z/stderr.gz" obj = s3.Object(bucket_name,key) n = obj.get()['Body'].read()

This is giving me a format which is not readable. I also tried

n = obj.get()['Body'].read().decode('utf-8')

which gives an error utf8' codec can't decode byte 0x8b in position 1: invalid start byte.

I have also tried

gzip = StringIO(obj) gzipfile = gzip.GzipFile(fileobj=gzip) content = gzipfile.read()

This returns an error IOError: Not a gzipped file

Not sure how to decode this .gz file.

Edit - Found a solution. Needed to pass n in it and use BytesIO

gzip = BytesIO(n)

Kirk · Accepted Answer

This is old, but you no longer need the BytesIO object in the middle of it (at least on my boto3==1.9.223 and python3.7)

import boto3 import gzip  s3 = boto3.resource("s3") obj = s3.Object("YOUR_BUCKET_NAME", "path/to/your_key.gz") with gzip.GzipFile(fileobj=obj.get()["Body"]) as gzipfile:     content = gzipfile.read() print(content)

Reading contents of a gzip file from a AWS S3 in Python

Tags:

python

amazon-web-services

amazon-s3

boto3

Kshitij Marwah

1 Answers

Kirk

Recent Activity

Donate For Us

Reading contents of a gzip file from a AWS S3 in Python

Tags:

python

amazon-web-services

amazon-s3

boto3

Kshitij Marwah

1 Answers

Kirk

Related questions

Recent Activity

Donate For Us