Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I decode a .gz file from S3 using an AWS Lambda function?

I have AWS Config sending snapshots of my AWS system to an S3 bucket every 12 hours. They are JSON files that are stored in a .json.gz format that contain information about the entire AWS system. On object creation in the bucket, a Lambda function is triggered to read that file. My plan is to read the JSON information in the function, parse through the data and create reports that describe certain elements of the AWS system, and push those reports to another S3 bucket.

My current code is:

data = s3.get_object(Bucket=bucket, Key=key)
text = data['Body'].read().decode('utf-8')
json_data = json.loads(text)

The error I am currently getting is: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte

My guess is this error is saying that certain bytes in data['Body'] are not ASCII characters. Clearly I cannot decode using standard utf-8, so I would like to unzip the .gz file instead. Is there a way to do this? I have already looked into zipfile.py but I can't really gather any information about my use case. Thanks.

like image 492
prudent4 Avatar asked Oct 28 '25 08:10

prudent4


1 Answers

You're correct - you can't decode this into text. You'll want something like:

import io
import gzip
import json

import boto3
from urllib.parse import unquote_plus

def handler_name(event, context): 
    s3client = boto3.client('s3')
    for record in event['Records']:
        bucket = record['s3']['bucket']['name']
        key = unquote_plus(record['s3']['object']['key'])

        response = s3client.get_object(Bucket=bucket, Key=key)
        content = response['Body'].read()
        with gzip.GzipFile(fileobj=io.BytesIO(content), mode='rb') as fh:
            yourJson = json.load(fh)

You can then use the yourJson variable to read the JSON.

like image 73
stdunbar Avatar answered Oct 31 '25 00:10

stdunbar